I got a mailbag question recently about some advice that floats freely around the internet regarding indexing for windowing functions.
But even after following all the best advice that Google could find, their query was still behaving poorly.
Why, why why?
Ten Toes Going Up
Let’s say we have a query that looks something like this:
SELECT u.DisplayName, u.Reputation, p.Score, p.PostTypeId FROM dbo.Users AS u JOIN ( SELECT p.Id, p.OwnerUserId, p.Score, p.PostTypeId, ROW_NUMBER() OVER ( PARTITION BY p.OwnerUserId, p.PostTypeId ORDER BY p.Score DESC ) AS n FROM dbo.Posts AS p ) AS p ON p.OwnerUserId = u.Id AND p.n = 1 WHERE u.Reputation >= 500000 ORDER BY u.Reputation DESC, p.Score DESC;
Without an index, this’ll drag on forever. Or about a minute.
But with a magical index that we heard about, we can fix everything!
Ten Toes Going Down
And so we create this mythical, magical index.
CREATE INDEX bubble_hard_in_the_double_r ON dbo.Posts ( OwnerUserId ASC, PostTypeId ASC, Score ASC );
But there’s still something odd in our query plan. Our Sort operator is… Well, it’s still there.
Oddly, we need to sort all three columns involved in our Windowing Function, even though the first two of them are in proper index order.
OwnerUserId and PostTypeId are both in ascending order. The only one that we didn’t stick to the script on is Score, which is asked for in descending order.
This is a somewhat foolish situation, all around. One column being out of order causing a three column sort is… eh.
We really need this index, instead:
CREATE INDEX bubble_hard_in_the_double_r ON dbo.Posts ( OwnerUserId ASC, PostTypeId ASC, Score DESC );
Granted, I don’t know that I like this plan at all without parallelism and batch mode, but we’ve been there before.
Thanks for reading!
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
- Common Table Expressions Are Useful For Rewriting Scalar Functions In SQL Server
- Multiple Distinct Aggregates: Still Harm Performance Without Batch Mode In SQL Server
- What’s Really Different About In-Memory Table Variables In SQL Server?
- Mind Your OUTPUT Targets In SQL Server, Some Of Them Hurt Query Performance