Deduplicating Missing Index Requests Part 3


We’ve got a set of missing index requests for a single table, and we’ve got the queries asking for them.

Going back to our queries and our index requests, all the queries have two things in common:

  • They filter on OwnerUserId
  • They order by Score

There are of course other elements in the where clause to attend to, but our job is to come up with one index that helps all of our queries.

Query Real Hard

To recap, these are our queries.

FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
GO 10

FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.CreationDate >= '20130101'

FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.PostTypeId = 1

FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.LastActivityDate >= '20130101'

FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.Score > 0

Index Real Dumb

Which means that all of our missing index requests are going to be on maybe a couple key columns, and then include every other column in the Posts table.

This is a bad idea, so we’re going to dismiss the includes and focus on keys.

CREATE INDEX [OwnerUserId_LastActivityDate_Includes] 
    ON [StackOverflow2013].[dbo].[Posts] ([OwnerUserId], [LastActivityDate]);

CREATE INDEX [OwnerUserId_Score_Includes] 
    ON [StackOverflow2013].[dbo].[Posts] ([OwnerUserId], [Score]);

CREATE INDEX [OwnerUserId_PostTypeId_Includes] 
    ON [StackOverflow2013].[dbo].[Posts] ([OwnerUserId], [PostTypeId]);

CREATE INDEX [OwnerUserId_CreationDate_Includes] 
    ON [StackOverflow2013].[dbo].[Posts] ([OwnerUserId], [CreationDate]);

CREATE INDEX [OwnerUserId_Includes] 
    ON [StackOverflow2013].[dbo].[Posts] ([OwnerUserId]);

Now that we’ve got a more sane bunch of requests to focus on, let’s do something thinking.

I hate thinking, so we won’t do a lot of it.

Indexes put data in order, and equality predicates preserve ordering of secondary index columns. That makes putting the key on (OwnerUserId, Score) a no-brainer. One could make an entire career out of avoiding sorting in the database.

But now we have three other columns to think about: LastActivityDate, PostTypeId, and CreationDate.

We could spend a whole lot of time trying to figure out the best order here, considering things like: equality predicates vs inequality predicates, and selectivity, etc.

But what good would it do?

Dirty Secret

No matter what order we might put index key columns in after Score, it won’t matter. Most of our queries don’t search on OwnerUserId and then Score. Only one of them does, and it doesn’t search on anything else.

That means that most of the time, we’d be seeking to OwnerUserId, and then performing residual predicates against other columns we’re searching on.

On top of that, we’d have whatever overhead there is of keeping things in order when we modify data in the key of the index. Not that included columns are free-of-charge to modify, but you get my point. There’s no order preserved in them.

In reality, a good-enough-index for the good-enough-optimizer to come up with a good-enough-plan looks like this:

CREATE INDEX good_enough
    ON dbo.Posts
        (OwnerUserId, Score)
        (PostTypeId, CreationDate, LastActivityDate);


The index above does two things:

  • It helps us search on a selective predicate on OwnerUserId
  • It keeps Score in order after the quality so the order by is free
  • It has all the other potential filtering elements so we can apply predicates locally
  • It teaches us that include column order doesn’t matter

All of the query plans will look roughly like this, regardless of the where clause:

you can do it

What Difference Does It Make?

Alright, so we’ve got one good-enough index for a bunch of different queries. By adding the index, we got all of them to go from taking ~600ms to taking 0ms.

What else did we do?

  • We made them faster without going parallel
  • They no longer need memory to sort data

And we did it without creating a gigantic covering index.

Of course, the optimizer still thinks we need indexes…

of what?

But do we really need them?


77% of nothing is nothing.

Thanks for reading!

Deduplicating Missing Index Requests Part 2


In yesterday’s post, we talked a little about different ways to approach looking at missing index requests, and how their benefit is calculated in sp_BlitzIndex.

If you’re on SQL Server 2019, you may be able to get some idea which queries are generating missing index requests. It’s not documented yet 🤔, but that’s never stopped anyone from using anything in production.

Let’s look at the queries asking for them.

Natural Ruckus

Let’s take a look at what queries are causing those missing index requests.

Since I’m running them, I don’t have to do any work.

FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656

FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.CreationDate >= '20130101'

FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.PostTypeId = 1

FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.LastActivityDate >= '20130101'

FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.Score > 0

Without any helpful indexes, all of these take about the same amount of time — between 600 and 700ms, and most of the time is spent scanning the clustered index on Posts, which is around 17 million rows, just to return ~27k rows..

That’s Not “Slow”

Usually when people are complaining about slow queries, they’re talking about stuff that drags on for several seconds or longer.

At just about half a second, most people wouldn’t getting a running start to jump through hoops to make these faster.

But they all have a “high cost” of a touch over 3000 query bucks. If you’re the type of person who only focuses on one metric, you might be overlooking a whole lot of things that need attention.

spuriously yours

If we made this query 99% faster, it would mean a rather miniscule improvement in elapsed time. The flip side of the one-metric thing is using duration alone as a metric. Queries might have a very long duration because they’re blocked, but use minimal resource when they’re finally allowed to run by the gods of ACID compliance.

My favorite queries to find and tune are ones that:

  • Unnecessarily run for a long time, using a lot of CPU
  • Unnecessarily ask for large memory grants
  • Have issues with parameter sniffing

Let’s pretend there might be some value to tuning these, though. Maybe we’re upset that they’re going parallel. Maybe we have something against scanning clustered indexes. Maybe we just don’t have anything else to do.

In tomorrow’s post, we’ll look at how to take all of those requests and come up with one good enough-index for our queries.

Just like how the optimizer comes up with one “good enough” plan for all queries.

Thanks for reading!

Deduplicating Missing Index Requests Part 1


I often find myself reviewing missing index requests during consulting engagements. Not because they’re so awesome, but because they’re often just good enough to provide some quick relief before more fine-tuned efforts are explored.

More to the point: if someone has no idea which queries they need to tune, and everything is pretty slow, this is a good starting place.

Given sufficient server uptime, of course.

World Tour

Sometimes you’ll see that slam-dunk missing index request with lots of uses, and you can tie it to a query that you know is bad. Of course, I’m quite partial to using sp_BlitzIndex to analyze indexes. There are a few different places that missing indexes will be detailed in.

  • Mode 0: the most important stuff
  • Mode 4: anything and everything
  • Mode 3: just missing index requests
  • Table Mode: analyzing just one table

The easiest way to find examples like I’ll be talking about is to look at just one table. In this case, the Posts table.

EXEC sp_BlitzIndex @TableName = 'Posts';

If you have missing index requests for a table, they’ll look something like this:

where you ack

That estimated benefit number is pretty big here, so it jumps out a bit. Normally I don’t start really paying attention until that number is >5 million. That’s not terribly scientific, but you have to draw the line somewhere.

Of course, one very sneaky thing to consider is when you have a set of duplicative requests with low-ish estimated benefit individually, but combined they just might add up to something quite useful.

Just A Kid

The estimated benefit number is just a function of the three feedback metrics that get logged with missing index requests: uses * impact * average query cost.

Uses is a fairly reliable metric, but impact and average query cost are a little more hand-wavy. Even high-cost queries can be very fast. It doesn’t mean that they can’t be tuned or don’t need indexes, but they might not be your worst-performers.

In tomorrow’s post, we’ll look at that, and how you can come up with a good-enough index for a bunch of similar queries.

Thanks for reading!

Join Me At Data Platform Summit 2020!

The Road From Nowhere

This year, I’m teaching an 8 hour online workshop at Data Platform Summit, and I’d love it if you joined me.

Here’s what I’ll be teaching:

Class Title: The Beginner’s Guide To Advanced Performance Tuning

Abstract: You’re new to SQL Server, and your job more and more is to fix performance problems, but you don’t know where to start.

You’ve been looking at queries, and query plans, and puzzling over indexes for a year or two, but it’s still not making a lot of sense.

Beyond that, you’re not even sure how to measure if your changes are working or even the right thing to do.

In this full day performance tuning extravaganza, you’ll learn about all the most common anti-patterns in T-SQL querying and indexing, and how to spot them using execution plans. You’ll also leave knowing why they cause the problems that they do, and how you can solve them quickly and painlessly.

If you want to gain the knowledge and confidence to tune queries so they’ll never be slow again, this is the training you need.

Date: Dec 7 & 8.

Time: 12 PM to 04 PM EST (View in your timezone)

Tickets: Tickets here!

Reasons Your Query Is Slow Right Now: Server Changes

You And What Change Management Board?

I know, it sounds foolish. Trivial, even. Don’t let people make changes without permission.

But the number of times I’ve been reviewing server hardware and settings with people, and they’ve started head scratching about how something ended up the way it did is fairly astounding.

There’s almost no way to audit every since change, either.

Fun, right? SQL is fun. That’s why everyone acts like it’s terrible. We wanna keep the fun to ourselves.

Sticker Shock

There are a lot of things that can change under the covers.

One fairly sneaky thing is if someone makes sp_configure changes without running configure, that kick in after a reboot.

Unexpected failovers, patching cycles that require reboots, and all sorts of other bump-in-the-night incidents can lead to this.

On top of that, there are just so many settings that can cause erratic performance, it’d be disingenuous of me to list them all here, even though it’d probably be great for SEO.

Instead, have a scroll through this list. See what I mean?

You’re gonna have a hard time remembering to keep track of what all your settings are now, and if they’re different in the middle of a performance storm. I don’t usually tell people to run sp_Blitz when that’s going on, but it can be a good sanity check, too.

Not all performance problems are caused by the robots. Some are caused by the humans, too.

For a general check, I’ll run it like this:

sp_Blitz @CheckServerInfo = 1;

You might be able to get to the root cause of things pretty quickly.

Thanks for reading!


Reasons Your Query Is Slow Right Now: A Bad Plan Appears

Update Stats, They Said

Okay look, you probably should update stats. At least when you do it, you have some control over the situation.

If you let SQL Server get up to its own devices, you might become quite surprised.

One after-effect of updated stats is, potentially, query plan invalidation. When that happens, SQL Server might get hard to work coming up with a new plan that makes sense based on these new statistics.

And that, dear friends, is where things can go bad.

New Contributor đź‘‹

Let’s say we have this query, which returns the average post and comment score for a single user.


SELECT u.DisplayName, 
       AVG(p.Score * 1.) AS lmao_p,
       AVG(c.Score * 1.) AS lmao_c
FROM   dbo.Users AS u
JOIN   dbo.Posts AS p
    ON p.OwnerUserId = u.Id
JOIN   dbo.Comments AS c
    ON c.UserId = u.Id
WHERE u.Id = @UserId
GROUP BY u.DisplayName;


Most of the time, the query runs fast enough for the occasional run to not end too poorly.

But then a recompile happens, and a new contributor decides to look at their profile.

Okay To Worse

What comes next you could fill a textbook with.

EXEC dbo.AveragePostScore @UserId = 3150367;

A new plan gets compiled:

wouldn’t get far

And you know, it looks great for a new user.

And you know, it looks not so great for a slightly more seasoned user.

you shouldn’t have!

So What Changed?

Running the query first for a user with a bit more site history gives us a plan with a very different shape, that finishes in under 2 seconds. Repeating that plan for less experienced users doesn’t cause any problems. It finishes in very little time at all.


The plan itself remains largely more familiar than most parameter sniffing scenarios wind up. There are plenty more similarities than differences. It really does just come down to join order here.

Alright, now we know what happened. How would we figure out if this happened to us IRL?

I Shot The Trouble

We can do what we did yesterday, and run sp_BlitzFirst. That’ll warn us if stats recently got updated.

EXEC sp_BlitzFirst @Seconds = 10, @ExpertMode = 1;

If we want to try to validate if that stats update messed with a plan, we can use sp_WhoIsActive:

EXEC sp_WhoIsActive @get_plans = 1, @get_avg_time = 1;

The get_avg_time parameter is one I’ve talked about before, and in this case it’s quite helpful.

We’ve got a bunch of queries that usually run in 0 seconds running for at least 20 seconds.

bonked out


Since we used get_plans here too, we can grab the execution plan, and see which parameters were used on compilation and execution.

Get the properties of the select operator, either by right clicking and selecting properties, or hitting F4.

not yet

Now we know two things, and can test four things:

  • If we run this query using the compiled value, is it fast?
  • If we run this query using the runtime value, is it slow?
  • If we recompile and run it for the runtime value, is it fast or slow?
  • Is the compile time value still fast when it uses the “new” plan?

This is a little easier with stored procs, because you can do something like:

EXEC sys.sp_recompile @objname = N'dbo.AveragePostScore';

To use DBCC FREEPROCCACHE to target a specific query, you need the sql handle or plan handle. You don’t wanna jump off and clear the whole cache here, unless you’re desperate. Just make sure you understand that you might fix one query, and break others, if you clear the whole thing.

It’s better to be targeted when possible.

And of course, if you’ve got Query Store up and running, you may do well to look at Regressed or High Variance query views there, and force the faster plan.

Thanks for reading!

Reasons Your Query Is Slow Right Now: Unchecked Maintenance

Chocolate Attack

I get it. You’re a good DBA.

No, you’re a great DBA.

You have alerts for when jobs fail, when they run long, and they’re all emailed to the ENTIRE COMPANY in beautifully formatted CSS.

Have a cigar.

But we all go on vacation sometime.


One thing that can make a workload feel artificially sluggish is maintenance.

  • Index stuff: Lotsa work, sometimes blocking
  • Backup stuff: Native fulls of VLDBs, with compression
  • DBCC CHECKDB: Oh, you forgot to run this because you were busy rebuilding indexes

Darn. Gimme that cigar back.

Right now, I’ve got a DBCC CHECK, and a query workload, that both run for one minute and 20 seconds when they run alone.

Quick Queries


The problem becomes when I run them both at the same time. The query workload runs for about 25 seconds longer, and CHECKDB runs for two and a half minutes.

That’s like, almost twice as long.

Ugly Queries

Now, it’s okay if you don’t have all those alerts set up. I guess.

Because you can use sp_BlitzFirst to see what’s going on, and get warned:

EXEC sp_BlitzFirst @Seconds = 10, @ExpertMode = 1;
Out of time

If you can end maintenance tasks and have performance go back to normal-bad, congratulations! You’ve solved today’s problem.

Have a cigar.

Thanks for reading!

Reasons Your Query Is Slow Right Now: Blocking, Blocking, Blocking


Blocking sucks. SQL Server should have used an optimistic isolation level by default.

Sure, writers would still block each other, but think of all the time people could have saved not explaining NOLOCK hints.

  • Yes, they still take locks
  • Yes, it’s the same as READ UNCOMMITTED
  • Yes, sometimes they make queries faster without blocking

Wait, what?

Moving Right Along

There are some interesting things to say about blocking, I suppose, aside from that it sucks.

For instance, it can look really weird when there’s parallelism. And read queries can block write queries.

There, I’ve said interesting things about blocking. It still sucks.

But how do you diagnose it?

First, close Activity Monitor.

Doctor, Doctor

My favorite tool for diagnosing blocking is sp_WhoIsActive, and I’m going to show you three different ways to look at it.

The safest way

EXEC sp_WhoIsActive @get_task_info = 2, @get_additional_info = 1;

This isn’t the way most people do it, but it is the least impactful.

You get back the normal set of results:


Cool, you can see queries, lock waits, and blocking sessions. But a short scroll to the right also brings you to this additional_info clickable column:

oily water

It’s not so helpful for the query doing the blocking, but it’ll tell you what the queries being blocked are stuck on.

The usual way

EXEC sp_WhoIsActive @get_locks = 1;

This is the way I normally demo looking for blocking with it, because it is more useful to see what the blocking query is doing.

But I’ve also had it be slow. Really, really slow.

That can happen when there is JUST SO MUCH BLOCKING that it takes a long time to enumerate all of it.

But you’ll get back this lovely sort of detail:

say it loud

Page locks. 102. Great.

The smart way

EXEC sp_WhoIsActive @find_block_leaders = 1, @sort_order = '[blocked_session_count] DESC';

It’s not always obvious by runtime which session is causing blocking, so you can use this command to find blocking chains, and order output by who’s doing the most blocking.



When you need to troubleshoot live blocking, these are the ways I usually check in on things with sp_WhoIsActive. It’s hard to beat.

Of course, long term, a monitoring tool, or logging the blocked process report to a table can help you when you’re not there hitting F5 live and in-person.

We’ve peeled back a couple scenarios where oddball things can slow a server down. Tomorrow we’ll look at a new one!

What’ll it be?

Your guess is not as good as mine.

Thanks for reading!

Reasons Your Query Is Slow Right Now: Your Server Is Overloaded

Oh Baby Another Series

Don’t you love lists? I love lists. They’re so nice and tidy, like a RegEx. Just the other night I was thinking how nice it’d be to write a RegeEx to detect RegEx patterns.

I didn’t sleep.

Anyway, SQL Server. Things. Things running slowly. What’s up with that?

In these posts, I’ll walk through some common scenarios, and how to look for them.

It might even be in the EXACT SAME VERY ORDER I LOOK FOR THEM!

Put your money away.

Reason 1: The Server Is Overloaded

If you go to open a new tab in SSMS and you hear a creaking sound, it might be time to revisit some of your hardware choices.

But that’s a problem for future you. Your problem is right now.

  • How to check it: sp_PressureDetector
  • What to look for: Stuck queries, Resource Usage, Wait Stats

By default, it’ll look at both CPU and Memory counters. If you don’t know what’s going on, just hit F5.

EXEC dbo.sp_PressureDetector;

If you’re hitting memory limits, things will look like this:


are you ready?

Some queries will have requested memory, but it won’t have been granted.


Waiting queries will be stuck in a queue, waiting on RESOURCE_SEMAPHORE.

we’re having a good time

A significant amount of memory will have been granted out, and available memory will be on the lower side. You’ll also probably see the waiter_count column

If you’re hitting CPU limits, things will look like this:


negative creep

Available threads might be a negative, or very low number. Requests may be waiting on threads, and the number will pretty closely match the number of rows that are…


Waiting in the pool.

she works hard for the money

This’ll also show you queries that are running, and which ones are using the most threads.

Can You Fix It Quickly?

Maybe, maybe not. If you’re allowed to kill off queries, you might be able to right the ship now. If not, you’re stuck waiting for queries to finish and give back their resources.

Longer term, you have a long checklist to go through, including asking tough questions about hardware, settings memory and parallelism settings, and your queries and indexes. This script will give you a lot of information about what’s going on. It’s up to you to figure out why.

If you need help with this sort of thing, drop me a line.

Thanks for reading!

Columnstore Loading on Eight Socket Servers

I had a brief opportunity to do SQL Server workload testing on an eight socket server. It didn’t go well.


I’ll give an extremely brief introduction to NUMA and sockets because I have a bit more free time these days. You’re probably reading this blog post on a machine with a single socket. All that means is that there’s a single CPU chip plugged into the motherboard. All of the RAM for the machine is local to that CPU chip. The recent king of single socket performance for SQL Server is the Intel 8280. 28 cores at 2.7 GHz with the latest instruction sets is quite a lot of processing power. There’s a newer chip out there now but it’s not clear to me if anyone can buy it yet.

What can you do if a workload can’t be scaled out and it needs more than the CPU power available with a single socket solution? A two socket solution could be the answer. A two socket server has two CPU chips plugged into different slots on a motherboard and each CPU has a local bank of RAM. Any part of memory can be access by either CPU but there is a performance penalty for foreign memory access, which is just accessing memory from a different CPU. See the simple diagram below:

As mentioned earlier, it is more expensive for CPU0 to access memory local to CPU1 compared to its own local memory. The performance penalty of NUMA will depend on the workload as well as the hardware. It can range from not noticeable at all to a serious issue that needs to be addressed in some way. In some situations this can be as easy as convincing your VM admin to change how the VM is set up. In others, your best bet may be to figure out a way to keep the workload on a single socket. Some folks advocate for single socket solutions compared to two socket solutions for OLTP workloads due to the NUMA penalty.

Perhaps you are wondering what happens if a workload needs more power available from a two socket server? Enter the four socket server. Four socket servers have four slots for CPU chips. An Intel 8280 processor in a four socket configuration gives you a total of 112 physical cores. That’s a lot. Four socket servers are more expensive than two socket servers and are significantly less common to find in data centers. Often, they are a custom order which makes months for the hardware vendor to build and deliver. In addition, the connections between CPUs are more complicated. Below are two common configurations:

Under configuration A, it takes two hops for CPU0 to access memory local to CPU3. That will increase latency more than going to CPU1 or CPU2. However, memory latency between CPUs won’t be the same for all pairs even in configuration B. In addition, it’s more complicated for SQL Server to try to manage memory on a four socket box. Even in the best case scenario for memory management, by design everything stored in SQL Server owned memory isn’t NUMA aware, such as the columnstore object pool. Nonetheless, if you need more CPU than a two socket server or VM can provide then you’re stuck living with the complexities of a four socket solution.

What if you need even more power? Hardware manufacturers once again have you covered with eight socket servers. These are even more expensive and uncommon than four socket solutions. Different vendors handle NUMA differently. There are simply more possible permutations with eight CPUs compared to four. I’m told that the following is an example layout for an eight socket machine:

I’m not a NUMA expert by any means. However, I think it’s fair to say that it looks a lot more complicated. There are many more instances where accessing foreign memory will take two hops instead of one. I also don’t see a way to carve out a four socket VM with all CPUs just one hop away. All in all, that diagram makes me nervous. As a final note, eight socket machines are not the limit of what’s possible, but I’ll stop here because I’ve never run code on anything larger than eight sockets.

Test Setup

I elected to use a high concurrency CCI insert workload to compare performance between a four socket VM and an eight socket VM. Quite conveniently, I already had a test columnstore workload that I knew pushed the SQL Server scalability limits in terms of memory management. To perform the threading I used the SQL Server Multi Thread open source framework. I wanted all sessions to go to their own schedulers. That could have been tough to manage with tests up to 200 threads but the threading framework handles that automatically.

For those following along at home, testing was done with SQL Server 2019 with LPIM and TF 876 enabled. Guest VMs were built with VMware with Windows Server 2019 installed. The four and eight socket VMs were created on the same physical host with about 5.5 TB of RAM available to the guest OS in both configurations.

Test Results

In case you didn’t click the earlier link, each job inserts about 10.4 million rows into a table with a clustered columnstore index. The smallest test ran just a single thread. The largest test on the four socket VM ran 100 concurrent jobs for a total insert volume of a billion rows. The largest test on the eight socket VM ran 200 concurrent jobs for a total insert volume of two billion rows. In all test scenarios, the four socket VM performed more work per second than the eight socket VM:

Quite a disappointing result. Double your SQL Server license fees for a slower overall system! There are the expected memory related wait stats for the 200 thread result:

I grabbed a sample of callstacks with PerfView:

Drilling into the top stack shows that decommitting memory is a scalability issue:

The stacks for the 100 thread test on the four socket VM results are a bit surprising by comparison:

Overall CPU reported by PerfView is the same. Perhaps this is a limitation of the tool. However, it really does seem like all of those extra cores simply aren’t helping under the eight socket configuration. Throughput goes down instead of up. I had very limited time with this machine and don’t want to repeat much of a previous blog post, so I’ll stop the technical analysis here. It’s likely that nothing can really be done to improve the workload except large configuration or hardware changes.

If I had to make this workload perform better, I would directly engage with the vendors. If that didn’t work, I would try a bare metal configuration, SQL Server on Linux, or running on an eight socket machine from a different vendor. I think that it’s fair to say that the test results would be different, but I can’t speculate as to whether they would be better or worse or what the difference would be.

Final Thoughts

We live in a wonderful era of hardware prosperity. With a big enough checkbook, you can run your SQL Server workloads on servers with hundreds of CPU cores. However, proceed with extreme caution if you’re considering moving a SQL Server workload to a server or VM with more than four sockets. It might not scale as well as you’re hoping. Thanks for reading!