A Suggestion To Make Lookups Less Painful

Odor Of Gas


One problem with Lookups, aside from the usual complaints, is that the optimizer has no options for when the lookup happens.

If the optimizer decides to use a nonclustered index to satisfy some part of the query, but the nonclustered index doesn’t have all of the columns needed to cover what the query is asking for, it has to do a lookup.

Whether the lookup is Key or RID depends on if the table has a clustered index, but that’s not entirely the point.

The point is that there’s no way for the optimizer to decide to defer the lookup until later in the plan, when it might be more opportune.

Gastric Acid


Let’s take one index, and two queries.

CREATE INDEX p
    ON dbo.Posts(PostTypeId, Score, CreationDate)
    INCLUDE(OwnerUserId);

Stop being gross.

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 6
ORDER BY p.CreationDate DESC;

The main point here is not that the lookup is bad; it’s actually good, and I wish both queries would use one.

odd choice

If we hint the first query to use the nonclustered index, things turn out better.

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p WITH(INDEX = p)
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;
woah woah woah you can’t use hints here this is a database

Running a full second faster seems like a good thing to me, but there’s a problem.

Ingest


Whether we use the lookup or scan the clustered index, all of these queries ask for rather large memory grants, between 5.5 and 6.5 GB

bigsort4u

The operator asking for memory is the Sort — and while I’d love it if we could index for every sort — it’s just not practical.

So like obviously changing optimizer behavior is way more practical. Ahem.

The reason that the Sort asks for so much memory in each of these cases is that it’s forced to order the entire select output from the Posts table by the CreationDate column.

donk

Detach


If we rewrite the query a bit, we can get the optimizer to sort data long before we go get all the output columns:

SELECT TOP (1000)
    u.DisplayName,
    p2.*
FROM dbo.Posts AS p
JOIN dbo.Posts AS p2
    ON p.Id = p2.Id
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;

SELECT TOP (1000)
    u.DisplayName,
    p2.*
FROM dbo.Posts AS p
JOIN dbo.Posts AS p2
    ON p.Id = p2.Id
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 6
ORDER BY p.CreationDate DESC;

In both cases, we get the same query plan shape, which is what we’re after:

  • Seek into the nonclustered index on Posts
  • Sort data by CreationDate
  • Join Posts to Users first
  • Join back to Posts for the select list columns
weeeeeeeeee

Because the Sort happens far earlier on in the plan, there’s less of a memory grant needed, and by quite a stretch from the 5+ GB before.

turn down

Thanks for reading!

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.



3 thoughts on “A Suggestion To Make Lookups Less Painful

  1. Hi Erik,

    Just a suggestion…

    So we (your readers) can repeat your good experiment and play a bit to see if there are other alternatives, what is the source of data that you’re playing this code against and were can we get it from?

    Just as a posting tip from an old dude from the world of “Do Drop In”, if you commonly use the data source (and I suspect it is probably the SO DB) for your posts, it would be helpful to your readers (especially this old fart) if you were to develop a standard blurb about where to get it that you could cut and paste into the beginning of each article.

  2. This is my new favourite (well, new to me) tuning tip.

    Just tested it in my home lab on StackOverflow2013, and the execution time went from “I could have made a sandwich & coffee in the time it took to execute” down to 1 second.

Leave a Reply

Your email address will not be published. Required fields are marked *