Sunday 13 April 2008

Looking after your data with TrueCrypt

One tool I've found useful recently is something that will be known to some people, but I think deserves to be better known. It's a utility called TrueCrypt, which provides convenient industrial-strength encryption for your data. I'd been trying to maintain a policy of not putting anything "sensitive" on removable media that I might conceivably misplace at some time (e.g. USB keys and disk drives), but I was still nervous about what might inadvertently be on there if I did lose something like that.

I came across TrueCrypt (www.truecrypt.org) first on the rather interesting "Security Now" netcast (http://www.twit.tv/sn). It was first described in episode 41 (http://www.twit.tv/sn41), and episode 133 (http://www.twit.tv/sn133) brings it up to date.

TrueCrypt is a piece of freeware that allows you to create a large encrypted file that you can mount as a disk partition under control of a password and optionally one or more "key files". In the simplest case, to mount the partition, you just need to supply the password. It supports what they call "traveller mode", where you can set up a USB key for example so it has the TrueCrypt software on it, and you get prompted for the password just after you insert it. You then end up with two partitions mounted - the USB key itself with the large file visible, and the "virtual" partition with the your filesystem available.

There is a bit of a "gotcha" here - you can't mount the filesystem on a system without TrueCrypt installed if you don't have administrator access. For that reason, if I've got a 2GB USB key for example, I usually leave 10 or 20 MB free for the occasional "sneakernet" use, and dedicate the rest for the TrueCrypt partition.

I have some experience of how well it scales - I've got a 400GB SATA drive in an external enclosure, and I've got two 200GB TrueCrypt partitions on it.

The combination of TrueCrypt with an external hard drive I think is a good option for offsite backups. The level of encryption stands up to the common regulatory standards - for example, we process credit card transactions so we come under the requirements of PCI (payment card industry) standards (https://www.pcisecuritystandards.org).

TrueCrypt version 5 adds another capability - whole drive encryption. You can install it on your laptop and let it encrypt the disk drive. Losing a laptop is bad enough (and I've been through that once!), but depending on what you have on it, the thought that someone might have access to your data, and possibly your ID information (banking, PayPal etc.) might be a whole lot worse.

There are some alternatives to TrueCrypt, e.g. http://www.ce-infosys.com/english/downloads/free_compusec/index.html for whole disk encryption, but I don't feel the need to change now!

Friday 11 April 2008

Trials and tribulations of SQL Server 2000 SP4

I was recently on the receiving end of that horrible sinking feeling for anyone involved with databases - a mission-critical production SQL Server just stopping doing its thing. Eventually, I ended up killing the SQLSERVR.EXE process in the task manager, and it came up again after some rollbacks and roll forwards.


Delving into the logs, I found a couple of things:
  • There were a few recent instances of access violations which matched this description pretty closely: http://support.microsoft.com/kb/821548. The problem was with parallel query execution in the ParallelPageSharedData::AddRef function. In the text dump files in the MSSQL directory, I found the query producing the access violation, and indeed it was reproducible. Interestingly, it still returned correct results though.

  • One of the occurrences of this came about two minutes before the first of several messages along the lines of "The Scheduler 0 appears to be hung. SPID 7, ECID 0, UMS Context 0x045530B8." This is a four-processor hyper-threaded machine, and the eight schedulers dropped off one by one until the server stopped. This looked very much like http://support.microsoft.com/kb/815056.

Things were back up and running for the time being, but clearly we couldn't let this happen again. The conclusion was to install service pack 4:


http://support.microsoft.com/kb/888799. After a couple of days running on one of our test servers, and some general testing, we were ready to come in at the weekend and install it on the products server.


All went well with the installation, and things seemed to be running. First sign of trouble was some slowness when one of the remote offices started work late Sunday evening. By the time the server was under the normal load on Monday morning, it was clear things weren't going well. Things were slow all round, and the disk I/O queues on the server data partition were looking terrible. Clearly something had changed, and looking at the list of bug fixes subsequent to SP4, it looked like there was some scope for whatever was causing the problem to be fixed: http://support.microsoft.com/kb/894905. In particular, we were clearly suffering from our 16GB server only using 8GB of memory (http://support.microsoft.com/kb/899761) when it usually uses around 14GB.


This was particularly worrying, because it was clear going into the SP4 installation that rolling it back was not going to be easy. Of course, we'd backed up the master, model and msdb databases before the installation, but it did look like rolling back would be quite painful and time-consuming, with downtime we would really want to avoid.

I eventually fastened on a hotfix rollup to bring SQL Server to build 2187: http://support.microsoft.com/kb/916287. Things were a little beyond the stage of careful, reasoned testing, so after running the rollup on the test server for a couple of hours, we scheduled some downtime and installed the rollup. Thing improved immediately, and continued to be back to the state before the nightmare that is SP4 was installed.


I was more than a little surprised that there didn't seem to be much about these particular problems with SP4 on the web. It does perhaps only affect reasonably big servers - lots of memory and multiple processors - but even so, there wasn't really a lot to warn us off the idea that SP4 would be a stable, tested update. Something of a lesson learned - a day of barely-suppressed panic to get back to where we were at the start of the day - although I'm not sure that it would have been easy to be any wiser before the event.