Friday 1 June 2007

Case sensitive searching in SQL Server

I needed to search some text for a particular capitalisation of our company name - we're told by our lawyers we weaken our trademark by not being consistent in capitalisation!

Like many people, our default collation for our databases is case-insentitive - we use SQL_Latin1_General_CP1_CI_AS. So the "LIKE" operator will operate in a case-insensitive way. However, I didn't want this, so I came up with a scheme to do this.

As an illustration, here's something that works with the example "Northwind" database:


CREATE PROC SearchProd(
@match_ins varchar(40) = '',
@notmatch_ins varchar(40) = '',
@match_sens varchar(40) = '',
@notmatch_sens varchar(40) = '') AS

CREATE TABLE #prodtemp(ProductID int, ProductNameSens nvarchar(40) collate SQL_Latin1_General_Cp1250_CS_AS)

INSERT INTO #prodtemp(ProductID, ProductNameSens)
SELECT ProductID, ProductName
FROM Products

SELECT Products.ProductID, ProductName
FROM Products
JOIN #prodtemp ON #prodtemp.ProductID = Products.ProductID
WHERE (ProductName LIKE '%' + @match_ins + '%') AND
(@notmatch_ins = '' OR ProductName NOT LIKE '%' + @notmatch_ins + '%') AND
(ProductNameSens LIKE '%' + @match_sens + '%') AND
(@notmatch_sens = '' OR ProductNameSens NOT LIKE '%' + @notmatch_sens + '%')
ORDER BY ProductName

DROP TABLE #prodtemp

GO


A few points to note here:
  • Use of the temporary table with the explicit "SQL_Latin1_General_Cp1250_CS_AS" for ProductNameSens. Using the "LIKE" operator with that field results in it behaving in a case sensitive way.
  • I often use that @match_ins varchar(40) = '' type of declaration to set up "wildcard" arguments that can be defaulted so they don't impose any limits on the recordset returned if they aren't supplied.
  • Note the (@notmatch_ins = '' OR ProductName NOT LIKE '%' + @notmatch_ins + '%') type constructs - there needs to be an initial test in the "negative" case to see if the argument should just not be used.
So:
exec SearchProd @match_ins = 'Ro'
is a case-insensitive search, and returns six records for me with the Northwind data, while
exec SearchProd @match_sens = 'Ro'
is case-sensitive, and returns four records.

This might not be the most optimal way to do this, and it should only be used for moderately small tables (it generates a lot of table and/or index scans typically!)

Has anyone come across a better way to do this?

Getting started ...

Thought it was about time I shared some "pearls of wisdom" I come across in my daily work, and this felt like a good vehicle for this. I guess like most bloggers, I'm starting out with the best of intentions of posting regularly, but on the other hand, I'm not fooling myself! I hope that I'll come across enough interesting bits and pieces to encourage me to post at least occasionally though.

I do believe there is too much email about these days though, so I'll not be posting unless I think I've got something at least moderately interesting to say!

So if you do drop by, please consider leaving a message - particularly if you find anything here interesting.