Gotta Have It
Sometimes I think it’s interesting how adding a seemingly useless or harmless thing to a query can change the query plan.
Here’s a quick example using an Order By on an indexed column.
I understand that without an ORDER BY, any TOP query will be non-deterministic. In this case, that’s okay. I only want to know if any Id exists in the Votes table for votes types that aren’t 5 or 8. Order doesn’t matter to me.
SELECT TOP (1) u.Id FROM dbo.Users AS u WHERE EXISTS ( SELECT 1 / 0 FROM dbo.Votes AS v WHERE v.UserId = u.Id AND v.VoteTypeId NOT IN ( 5, 8 ) );
The trouble is that this query runs for about 10 seconds to find nothing.
Yes, there are many other ways to express this query — you might even use a COUNT, which would bypass the problem — but hey, some people love TOPs.
Hors d’Oeuvres By
Adding an order by here has a rather significant impact on the query plan, even though the column I’m asking to be ordered is the PK/CX of the Users table, meaning it’s already in order.
SELECT TOP (1) u.Id FROM dbo.Users AS u WHERE EXISTS ( SELECT 1 / 0 FROM dbo.Votes AS v WHERE v.UserId = u.Id AND v.VoteTypeId NOT IN ( 5, 8 ) ) ORDER BY u.Id;
The query plan now looks like this:
Who’s That Sort?
Why did that happen? Let’s take a look!
Notice that the Sort isn’t taking place for the Users table, but rather the Votes table.
We’re putting the UserId column in order now. This is to help us with the Nested Loops operator, which has slightly different properties now.
Notice how one Nested Loops join used an Ordered Prefetch, and the other uses an Unordered Prefetch?
That’s a side effect of the ORDER BY.
And, yeah, the plan with the Order By is “faster” because it went parallel. That won’t always be the case, and when it’s not, any efficiency is lost.
Something To Keep In Mind
Asking for ordered data can change a lot of things about a query. More superficial things, like indexes used, joins and aggregates chosen, parallelism or serial(ism?). It can also change less obvious things, like memory grants, the type of prefetch used, etc.
Sometimes you don’t have a choice in the matter — you need data in a specific order at some point in the query for correctness — but quite often presentation layer ordering is best left out of your queries. Unless of course you have indexes that store data in the order you want, so there’s no extra work incurred.
Thanks for reading!