.NET Nugget 23: SQL Server Substring Not Like .NET Substring

March 28, 2010 (Last Modified: March 29, 2010)

I ran into something this week that was very unexpected: the SQL Server implementation of substring does NOT work like the substring implementation I’m use to in .NET (or really any other language I’ve run into). While this .Net Nugget is really for TSQL, it did stand out as a difference from how .NET does it so I thought it was worth mentioning.

The docs on SQL Server say this:

*“start_expression: Is an integer or **bigint *expression that specifies where the returned characters start. If start_expression is less than 1, the returned expression will begin at the first character that is specifed [sic] in value_expression. In this case, the number of characters that are returned is the largest value of either the sum of start_expression and length_expression or 0. If start_expression is greater than the number of characters in the value expression, a zero-length expression is returned.”

The key here is what it says about a negative starting value. The way I read this to work the first time led me to believe that a negative number provided (or even a zero) would start the resulting string at the start of the input string. This isn’t the case.

With the following

SELECT SUBSTRING(‘Hello Cruel World’, 0, 3)

I would expect to get a result of ‘Hel’, but in fact I get ‘He’.

SELECT SUBSTRING(‘Hello Cruel World’, -1, 3)

the above actually yields ‘H’. and

SELECT SUBSTRING(‘Hello Cruel World’, -3, 3)

produces an empty string.

While this behavior technically agrees with what the docs are indicating, this seems very counter-intuitive to me.

I was burned by this because I was doing a strip on a column that was for zip codes that were defined in our staging table as being VARCHAR(10), but in the destination table as VARCHAR(5). The business rule was to strip excess characters from the imported data, so I used:

SUBSTRING(postalCode,0,5)

This had the effect of stripping the last digit from the zip code, which as you can imagine, was not the desired affect. This exactly the parameters you would use in .NET, but not so much in TSQL. You learn something new all the time. I added a community comment to the MSDN docs to hopefully save other people from a similar surprise.

UPDATE (3/29/2010): Yes, I realize if I had tests around this I would have not have burned by it. This is a case of “I’m only running this migration code once and I’ll throw that code away so I really don’t need a test” that came back to bite me. Luckily I had the ability to easily reverse the damage. I also learned the lesson of having at least an end-to-end test in these types of scenarios is likely a good idea. Live and Learn.