This is still one of the most common problems I see in queries.
People are terrified of NULLs. People are afraid to merge on freeways in Los Angeles.
What results is this endless stream of poorly performing queries, and some surprise logic bugs along the way.
I don’t have much more else of an intro. The TL;DR is that you should use natural expressions like
IS NULL or
IS NOT NULL, rather than any of the built in functions available to you in SQL Server, like
COALESCE, et al. which are presentation layer functions with no relational meaning whatsoever.
From here on out, we’ll be calling them unnatural expressions. Perhaps that will get through to you.
First is something I’ve covered before, but when you use unnatural expressions, the optimizer will not give you feedback about useful indexes.
The first query generates a missing index request, the second one does not. The optimizer has abandoned all hope with the use of an unnatural expression.
The other issue with unnatural expressions comes down to implicit conversion.
Take this, for instance.
DECLARE @i int = 0; SELECT c = CASE ISNULL(@i, '') WHEN '' THEN 1 ELSE 0 END;
This will return a 1, because 0 and ” can be implicitly converted.
Perhaps less obvious, and more rare, is this:
DECLARE @d datetime = '19000101'; SELECT c = CASE ISNULL(@d, '') WHEN '' THEN 1 ELSE 0 END;
Which will also return 1.
Not many databases have stuff going back to 1900, but I do see people using that as a canary value often enough.
If that’s not enough to get you off the idea, let’s look at how this stuff plays out in the real world.
First, let’s get ourselves an index. Without that, there’s fundamentally no difference in performance.
CREATE INDEX v ON dbo.Votes (BountyAmount);
Our gold standard will be these two queries:
SELECT c = COUNT_BIG(*) FROM dbo.Votes AS v WHERE v.BountyAmount IS NULL; SELECT c = COUNT_BIG(*) FROM dbo.Votes AS v WHERE v.BountyAmount IS NOT NULL;
The first one that checks for NULL values returns a count of 182,348,084.
The second one that checks for NOT NULL values returns a count of 344,070.
Keep those in mind!
The query plans for them both look like this:
Which run, respectively (and respectably), in 846ms and 26ms. Obviously the query with the more selective predicate will have a time advantage, here.
Here’s where things start to go wrong.
This query returns incorrect results, but you’re probably used to that because of all the NOLOCK hints in your queries anyway.
SELECT c = COUNT_BIG(*) FROM dbo.Votes AS v WHERE ISNULL(v.BountyAmount, '') = '';
A count of 182,349,088 is returned rather than 182,348,084, because there are 1004 rows with a bounty of 0.
Even though we have an empty string in our query, it’s implicitly converted to 0.
And you thought you were so clever.
The exercises in futility that I see people carrying on with often look make use of
It’s worth noting here that
COALESCE is only a
CASE expression underneath, anyway. They are interchangeable in this respect.
For findings NULLs, people will screw up and do this:
SELECT c = COUNT_BIG(*) FROM dbo.Votes AS v WHERE ISNULL(v.BountyAmount, -1) = -1; SELECT c = COUNT_BIG(*) FROM dbo.Votes AS v WHERE COALESCE(v.BountyAmount, -1) = -1;
We can use -1 here because it doesn’t naturally occur in the data. Results are correct for both, but performance is comparatively horrible.
We’re looking at 2.5 seconds compared to 900ms. This situation gets worse with the more selective predicates, too.
These both take roughly the same time as the other unnatural forms of this query, but recall the natural version of this query finished in under 30ms.
I hope I don’t have to write about this anymore, but at the rate I see people doing this stuff, I kind of doubt it.
Broken Record Enterprises, it feels like sometimes.
I’m not sure why anyone thinks this is a good idea. I’ve heard rumors that it comes from application developers who are used to NULLs throwing errors writing SQL queries, where they don’t pose the same threat.
Who knows, though. Maybe people just really like the festive pink text color that functions turn in SSMS.
Thanks for reading!
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
- Inline Table Valued Functions And Catch All Queries In SQL Server
- Using Views To Reduce Memory Grants In SQL Server
- Getting The Top Value Per Group With Multiple Conditions In SQL Server: Row Number vs. Cross Apply With MAX
- Getting The Top Value Per Group In SQL Server: Row Number vs. Cross Apply Performance