Like most tricks, this has a specific use case, but can be quite effective when you spot it.
I’m going to assume you have a vague understanding of parameter sniffing with stored procedures going into this. If you don’t, the post may not make a lot of sense.
Or, heck, maybe it’ll give you a vague understanding of parameter sniffing in stored procedures.
One For The Money
Say I have a stored procedure that accepts a parameter called @Reputation.
The body of the procedure looks like this:
SELECT TOP (1000) u.* FROM dbo.Users AS u WHERE u.Reputation = @Reputation ORDER BY u.CreationDate DESC;
In the users table, there are a lot of people with a Reputation of 1.
There are not so many with a Reputation of 2.
+------------+---------+ | Reputation | records | +------------+---------+ | 1 | 1090043 | | 2 | 1854 | +------------+---------+
Two For The Slow
Data distributions like this matter. They change how SQL Server approaches coming up with an execution plan for a query.
Which indexes to use, what kind of joins to use, how to aggregate data, if the plan should be serial or parallel…
The list goes on and on.
In this case, we have a narrow-ish nonclustered index:
CREATE INDEX ix_whatever ON dbo.Users (Reputation, Age, CreationDate);
When I run my stored procedure and look for Reputation = 2, the plan is very fast.
EXEC dbo.WORLDSTAR @Reputation = 2;
This is a great plan for a small number of rows.
When I run it for a large number of rows, it’s not nearly as fast.
EXEC dbo.WORLDSTAR @Reputation = 1;
We go from a fraction of a second to over three seconds.
This is bad parameter sniffing.
If we run it for Reputation = 1 first, we don’t have the same problem.
That’s good(ish) parameter sniffing.
Better For Everyone
Many things that prevent parameter sniffing will only give you a so-so plan. It may be better than the alternative, but it’s certainly not a “fix”.
It’s possible to get a better plan for everyone in this situation by re-writing the Key Lookup as a self join
SELECT TOP (1000) u2.* FROM dbo.Users AS u JOIN dbo.Users AS u2 ON u.Id = u2.Id WHERE u.Reputation = @Reputation ORDER BY u.CreationDate DESC;
The reason why is slightly complicated, but I’ll do my best to explain it simply.
Here’s what the bad parameter sniffing plan looks like for each query.
Note that the Key Lookup plan still runs for ~3 seconds, while the self-join plan runs for around half a second.
While it’s possible for Key Lookups to have Sorts introduced to optimize I/O… That doesn’t happen here.
The main difference between the two plans (aside from run time), is the position of the Sort.
In the Key Lookup plan (top), the Key Lookup between the nonclustered and clustered indexes runs to completion.
In other words, for everyone with a Reputation of 1, we go to the clustered index to retrieve the columns that aren’t part of the nonclustered index.
In the self-join plan (bottom), all rows go into the Sort, but only the 1000 come out.
The difference is more obvious when viewed with Plan Explorer.
In the Key Lookup plan, rows aren’t narrowed until the end so a seek occurs ~1mm times.
In the self-join plan, they’re eliminated directly after the Index Seek, so the join only runs for 1000 rows and produces 1000 seeks.
This doesn’t mean that Top N Sorts are bad, it just means that they may not produce the most optimal plans for Key Lookups.
When This Doesn’t Work
Without a TOP, the self-join pattern isn’t as dramatically faster, but it is about half a second better (4.3s vs. 3.8s) for the bad parameter sniffing scenario, and far less for the others.
Of course, an index change to put CreationDate as the second key column fixes the issue by removing the need to sort data at all.
CREATE INDEX ix_whatever --Current Index ON dbo.Users (Reputation, Age, CreationDate); GO CREATE INDEX ix_apathy --Better Index For This Query ON dbo.Users (Reputation, CreationDate, Age); GO
But, you know, not everyone is able to make index changes easily, and changing the key column order can cause problems for other queries.
Thanks for reading!