Let’s Design An Index Together Part 3

Previously

We had a couple queries we wanted to make fast, but SQL Server’s missing index request had mixed results.

Our job now is to figure out how to even things out. To do that, we’re gonna need to mess with out index a little bit.

Right now, we have this one:

Which is fine when we need to Sort a small amount of data.

There’s only about 25k rows with a PostTypeId of 4. That’s easy to deal with.

The problem is here:

Theres 6,000,223 rows with a PostTypeId of 1 — that’s a question.

Don’t get me started on PostTypeId 2 — that’s an answer — which has 11,091,349 rows.

Change Management

What a lot of people try first is an index that leads with Score. Even though it’s not in the WHERE clause to help us find data, the index putting Score in order first seems like a tempting fix to our problem.

The result is pretty successful. Both plans are likely fast enough, and we could stop here, but we’d miss a key point about B-Tree indexes.

It’s not so bad.

What’s a bit deceptive about the speed is the amount of reads we do to locate our data.

Scan-Some

We only need to read 15k rows to find the top 5000 Questions — remember that these are very common.

We need to read many more rows to find the top 5000… Er… Whatever a 4 means.

Imaginary Readers

Nearly the entire index is read to locate these Post Types.

Meet In The Middle

The point we’d miss if we stopped tuning there is that when we add key columns to a B-Tree index, the index is first ordered by the leading key column. If it’s not unique, then the second column is ordered within each range of values.

Pale Coogi Wave

Putting this together, let’s change our index a little bit:

With the understanding that seeking to a single PostTypeId column will bring us to an ordered Sort column for that range of values.

Now our plans look like this:

🔥🔥🔥

Which allows us to both avoid the Sort and keep reads to a minimum.

reed les

Interior Design

When designing indexes, it’s important to keep the goal of queries in mind. Often, predicates should be the primary consideration.

Other times, we need to take ordering and grouping into account. For example, if we’re using window functions, performance might be unacceptable without indexing the partition by and order by elements, and we may need to move other columns to parts of the index that may not initially seem ideal.

Thanks for reading!

5 thoughts on “Let’s Design An Index Together Part 3”

  1. Makes sense; not that you needed any confirmation. 🙂
    I like how you made this a challenge! It was a fun thought experiment!
    I thought, “he wants us to think; this isn’t just ‘predicate indexing'”.

    There is a single-valued equality predicate and a “left equal and/or expanding” (probably a better term for this) predicate. So, the kneejerk keys are not optimal for both, or other, statements.

    The engine must scan the [seek] range, anyway (may be able to bail out under certain circumstances or conditions), at least it’s ordered, as requested, and can return the matching rows on LastActivityDate.

    This was fun!

  2. What would you do if the where condition is still a predicate after you have an index that would support it? It’s still scanning the whole table. The dates just aren’t selective enough?

      1. i’ve got a query with a few joins, mostly left. the only where clause is ActivityEndTime > @startDateTime AND ActivityStartTime < @endDateTime and this is still showing up as a predicate and avoiding the index on starttime/endtime and doing a full scan. Puzzled why?

Leave a Reply

Your email address will not be published. Required fields are marked *