Wednesday 18 August 2010

Full-text indexing problems

I ran into a problem recently setting up full-text indexing on a test version of a database, cloned from a production database schema. The initial problem was that the original index was on a drive letter that didn't exist on the target machine. After sorting that out, which involved a little bit of system table manipulation, I then ran into another problem trying to rebuild the indexes. I got as far as a stored procedure call like this:

exec sp_fulltext_table N'[dbo].[credit]', N'create', N'exp_main', N'PK_credit'

which gave:

Server: Msg 2526, Level 16, State 3, Procedure sp_fulltext_table, Line 122
Incorrect DBCC statement. Check the documentation for the correct DBCC syntax and options.


I followed the advice described here:

http://www.eggheadcafe.com/forumarchives/SQLServerfulltext/Aug2005/post23469255.asp

In particular, manually reinstalling the MSSearch / full text search facility as described here:

http://support.microsoft.com/default.aspx?scid=kb;EN-US;827449

... however, it still didn't work.

The issue was resolved when I realised it was something to do with the sp_fulltext_table procedure. In the course of my messing about, it had lost it's "system" attribute - see:

http://www.informit.com/articles/article.aspx?p=25288&seqNum=6

As mentioned here, "DBCC CALLFULLTEXT" needs this to be set to work properly.

Sure enough looking at master.dbo.sysobjects.status for sp_fulltext_table gave
1610612737, which is 0x60000001. This isn't really documented in any detail anywhere, but I found from some more messing about and looking at the source text of the sp_MS_marksystemobject procedure, I found that it actually sets the top two bits in the status field (i.e. 0xC0000000) - so I ran:

sp_MS_marksystemobject 'sp_fulltext_table'

and it all started working.

Friday 23 January 2009

Validating IMEI numbers

Technorati Tags: ,,


I was recently working with a database which included a lot of serial numbers, which in some cases where IMEI numbers of phones, and I wanted to be able to identify the cases that looked like valid IMEIs.

There is some discussion of the format at http://en.wikipedia.org/wiki/IMEI. What I wanted to do was just to validate the IMEI (as opposed to IMEI/SV) format – 14 digits plus the Luhn check digit.

Here’s what I ended up with – a function to return a bit value of 1 if the supplied value is valid, and 0 if not:

CREATE FUNCTION ValidIMEI(@imei varchar(24)) RETURNS bit AS

BEGIN

DECLARE @res    bit
DECLARE @i      int
DECLARE @digval int
DECLARE @total  int

SET @res = 0
SET @total = 0

IF LEN(@imei) = 15 BEGIN
  SET @i = 1
  WHILE @i <= 14 BEGIN

    SET @digval =
        (ASCII(SUBSTRING(@imei, @i, 1)) - 
         ASCII('0')) * (2 - (@i % 2)) 
                   -- Double each even digit
    SET @total =
          @total + @digval / 10 + @digval % 10

    SET @i = @i + 1
  END

  IF SUBSTRING(@imei, 15, 1) = 
       CHAR((1000 - @total) % 10 + 
          ASCII('0')) SET @res = 1

END

RETURN(@res)

END


So, for example:

select dbo.validimei('000000000000000'), dbo.validimei('000000000000001')

returns 1 and 0 (although clearly these are unlikely to be valid IMEIs, since this just validates the length and check digit and not necessarily that the various fields of the alleged IMEI number are actually valid. Still however, it proved reasonably useful, and actually ran tolerably quickly, validation of half a million IMEIs in a few seconds.

Wednesday 24 December 2008

I'm getting published ...

I'm pleased to say that I've had an article accepted for the rather excellent www.sqlservercentral.com - scheduled for publication on 29th Dec, entitled "Scripting INSERTs"

Sunday 22 June 2008

A couple of free software picks

More than a little frustrated with Windows Notepad a little while ago, I thought I must be able to find something better. Occasionally I find myself editing text files in the Delphi editor just because of some of the features it gives me. I was even prepared to pay (well, a sensible shareware price anyway), but it turned out not to be necessary.

I was looking to tick a few boxes:

  • Multiple files open at a time
  • Cut and paste of rectangular block
  • Regular expression search/replace
  • Some kind of macro replay facility.

Casting around a bit, I found Notepad++ on SourceForge -
http://notepad-plus.sourceforge.net/uk/site.htm – which ticks those boxes, and a whole lot more, including syntax highlighting for a long list of languages (including SQL and Pascal), folding capabilities (particularly useful for expanding and collapsing sections of XML files), side-by-side compare of two edit buffers, and lot more.


The second one is a well-known (to some, anyway) FTP solution, FileZilla (http://filezilla-project.org/). Both client and server software, with support for secure FTP. Nothing too flash, but it just works.

Sunday 13 April 2008

Looking after your data with TrueCrypt

One tool I've found useful recently is something that will be known to some people, but I think deserves to be better known. It's a utility called TrueCrypt, which provides convenient industrial-strength encryption for your data. I'd been trying to maintain a policy of not putting anything "sensitive" on removable media that I might conceivably misplace at some time (e.g. USB keys and disk drives), but I was still nervous about what might inadvertently be on there if I did lose something like that.

I came across TrueCrypt (www.truecrypt.org) first on the rather interesting "Security Now" netcast (http://www.twit.tv/sn). It was first described in episode 41 (http://www.twit.tv/sn41), and episode 133 (http://www.twit.tv/sn133) brings it up to date.

TrueCrypt is a piece of freeware that allows you to create a large encrypted file that you can mount as a disk partition under control of a password and optionally one or more "key files". In the simplest case, to mount the partition, you just need to supply the password. It supports what they call "traveller mode", where you can set up a USB key for example so it has the TrueCrypt software on it, and you get prompted for the password just after you insert it. You then end up with two partitions mounted - the USB key itself with the large file visible, and the "virtual" partition with the your filesystem available.

There is a bit of a "gotcha" here - you can't mount the filesystem on a system without TrueCrypt installed if you don't have administrator access. For that reason, if I've got a 2GB USB key for example, I usually leave 10 or 20 MB free for the occasional "sneakernet" use, and dedicate the rest for the TrueCrypt partition.

I have some experience of how well it scales - I've got a 400GB SATA drive in an external enclosure, and I've got two 200GB TrueCrypt partitions on it.

The combination of TrueCrypt with an external hard drive I think is a good option for offsite backups. The level of encryption stands up to the common regulatory standards - for example, we process credit card transactions so we come under the requirements of PCI (payment card industry) standards (https://www.pcisecuritystandards.org).

TrueCrypt version 5 adds another capability - whole drive encryption. You can install it on your laptop and let it encrypt the disk drive. Losing a laptop is bad enough (and I've been through that once!), but depending on what you have on it, the thought that someone might have access to your data, and possibly your ID information (banking, PayPal etc.) might be a whole lot worse.

There are some alternatives to TrueCrypt, e.g. http://www.ce-infosys.com/english/downloads/free_compusec/index.html for whole disk encryption, but I don't feel the need to change now!

Friday 11 April 2008

Trials and tribulations of SQL Server 2000 SP4

I was recently on the receiving end of that horrible sinking feeling for anyone involved with databases - a mission-critical production SQL Server just stopping doing its thing. Eventually, I ended up killing the SQLSERVR.EXE process in the task manager, and it came up again after some rollbacks and roll forwards.


Delving into the logs, I found a couple of things:
  • There were a few recent instances of access violations which matched this description pretty closely: http://support.microsoft.com/kb/821548. The problem was with parallel query execution in the ParallelPageSharedData::AddRef function. In the text dump files in the MSSQL directory, I found the query producing the access violation, and indeed it was reproducible. Interestingly, it still returned correct results though.

  • One of the occurrences of this came about two minutes before the first of several messages along the lines of "The Scheduler 0 appears to be hung. SPID 7, ECID 0, UMS Context 0x045530B8." This is a four-processor hyper-threaded machine, and the eight schedulers dropped off one by one until the server stopped. This looked very much like http://support.microsoft.com/kb/815056.

Things were back up and running for the time being, but clearly we couldn't let this happen again. The conclusion was to install service pack 4:


http://support.microsoft.com/kb/888799. After a couple of days running on one of our test servers, and some general testing, we were ready to come in at the weekend and install it on the products server.


All went well with the installation, and things seemed to be running. First sign of trouble was some slowness when one of the remote offices started work late Sunday evening. By the time the server was under the normal load on Monday morning, it was clear things weren't going well. Things were slow all round, and the disk I/O queues on the server data partition were looking terrible. Clearly something had changed, and looking at the list of bug fixes subsequent to SP4, it looked like there was some scope for whatever was causing the problem to be fixed: http://support.microsoft.com/kb/894905. In particular, we were clearly suffering from our 16GB server only using 8GB of memory (http://support.microsoft.com/kb/899761) when it usually uses around 14GB.


This was particularly worrying, because it was clear going into the SP4 installation that rolling it back was not going to be easy. Of course, we'd backed up the master, model and msdb databases before the installation, but it did look like rolling back would be quite painful and time-consuming, with downtime we would really want to avoid.

I eventually fastened on a hotfix rollup to bring SQL Server to build 2187: http://support.microsoft.com/kb/916287. Things were a little beyond the stage of careful, reasoned testing, so after running the rollup on the test server for a couple of hours, we scheduled some downtime and installed the rollup. Thing improved immediately, and continued to be back to the state before the nightmare that is SP4 was installed.


I was more than a little surprised that there didn't seem to be much about these particular problems with SP4 on the web. It does perhaps only affect reasonably big servers - lots of memory and multiple processors - but even so, there wasn't really a lot to warn us off the idea that SP4 would be a stable, tested update. Something of a lesson learned - a day of barely-suppressed panic to get back to where we were at the start of the day - although I'm not sure that it would have been easy to be any wiser before the event.

Friday 1 June 2007

Case sensitive searching in SQL Server

I needed to search some text for a particular capitalisation of our company name - we're told by our lawyers we weaken our trademark by not being consistent in capitalisation!

Like many people, our default collation for our databases is case-insentitive - we use SQL_Latin1_General_CP1_CI_AS. So the "LIKE" operator will operate in a case-insensitive way. However, I didn't want this, so I came up with a scheme to do this.

As an illustration, here's something that works with the example "Northwind" database:


CREATE PROC SearchProd(
@match_ins varchar(40) = '',
@notmatch_ins varchar(40) = '',
@match_sens varchar(40) = '',
@notmatch_sens varchar(40) = '') AS

CREATE TABLE #prodtemp(ProductID int, ProductNameSens nvarchar(40) collate SQL_Latin1_General_Cp1250_CS_AS)

INSERT INTO #prodtemp(ProductID, ProductNameSens)
SELECT ProductID, ProductName
FROM Products

SELECT Products.ProductID, ProductName
FROM Products
JOIN #prodtemp ON #prodtemp.ProductID = Products.ProductID
WHERE (ProductName LIKE '%' + @match_ins + '%') AND
(@notmatch_ins = '' OR ProductName NOT LIKE '%' + @notmatch_ins + '%') AND
(ProductNameSens LIKE '%' + @match_sens + '%') AND
(@notmatch_sens = '' OR ProductNameSens NOT LIKE '%' + @notmatch_sens + '%')
ORDER BY ProductName

DROP TABLE #prodtemp

GO


A few points to note here:
  • Use of the temporary table with the explicit "SQL_Latin1_General_Cp1250_CS_AS" for ProductNameSens. Using the "LIKE" operator with that field results in it behaving in a case sensitive way.
  • I often use that @match_ins varchar(40) = '' type of declaration to set up "wildcard" arguments that can be defaulted so they don't impose any limits on the recordset returned if they aren't supplied.
  • Note the (@notmatch_ins = '' OR ProductName NOT LIKE '%' + @notmatch_ins + '%') type constructs - there needs to be an initial test in the "negative" case to see if the argument should just not be used.
So:
exec SearchProd @match_ins = 'Ro'
is a case-insensitive search, and returns six records for me with the Northwind data, while
exec SearchProd @match_sens = 'Ro'
is case-sensitive, and returns four records.

This might not be the most optimal way to do this, and it should only be used for moderately small tables (it generates a lot of table and/or index scans typically!)

Has anyone come across a better way to do this?