Reasons Your Query Is Slow Right Now: A Bad Plan Appears

Update Stats, They Said


Okay look, you probably should update stats. At least when you do it, you have some control over the situation.

If you let SQL Server get up to its own devices, you might become quite surprised.

One after-effect of updated stats is, potentially, query plan invalidation. When that happens, SQL Server might get hard to work coming up with a new plan that makes sense based on these new statistics.

And that, dear friends, is where things can go bad.

New Contributor 👋


Let’s say we have this query, which returns the average post and comment score for a single user.

CREATE OR ALTER PROCEDURE dbo.AveragePostScore(@UserId INT)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

SELECT u.DisplayName, 
       AVG(p.Score * 1.) AS lmao_p,
       AVG(c.Score * 1.) AS lmao_c
FROM   dbo.Users AS u
JOIN   dbo.Posts AS p
    ON p.OwnerUserId = u.Id
JOIN   dbo.Comments AS c
    ON c.UserId = u.Id
WHERE u.Id = @UserId
GROUP BY u.DisplayName;

END;
GO

Most of the time, the query runs fast enough for the occasional run to not end too poorly.

But then a recompile happens, and a new contributor decides to look at their profile.

Okay To Worse


What comes next you could fill a textbook with.

EXEC dbo.AveragePostScore @UserId = 3150367;

A new plan gets compiled:

wouldn’t get far

And you know, it looks great for a new user.

And you know, it looks not so great for a slightly more seasoned user.

you shouldn’t have!

So What Changed?


Running the query first for a user with a bit more site history gives us a plan with a very different shape, that finishes in under 2 seconds. Repeating that plan for less experienced users doesn’t cause any problems. It finishes in very little time at all.

JERN ERDR

The plan itself remains largely more familiar than most parameter sniffing scenarios wind up. There are plenty more similarities than differences. It really does just come down to join order here.

Alright, now we know what happened. How would we figure out if this happened to us IRL?

I Shot The Trouble


We can do what we did yesterday, and run sp_BlitzFirst. That’ll warn us if stats recently got updated.

EXEC sp_BlitzFirst @Seconds = 10, @ExpertMode = 1;

If we want to try to validate if that stats update messed with a plan, we can use sp_WhoIsActive:

EXEC sp_WhoIsActive @get_plans = 1, @get_avg_time = 1;

The get_avg_time parameter is one I’ve talked about before, and in this case it’s quite helpful.

We’ve got a bunch of queries that usually run in 0 seconds running for at least 20 seconds.

bonked out

Fingerprinted


Since we used get_plans here too, we can grab the execution plan, and see which parameters were used on compilation and execution.

Get the properties of the select operator, either by right clicking and selecting properties, or hitting F4.

not yet

Now we know two things, and can test four things:

  • If we run this query using the compiled value, is it fast?
  • If we run this query using the runtime value, is it slow?
  • If we recompile and run it for the runtime value, is it fast or slow?
  • Is the compile time value still fast when it uses the “new” plan?

This is a little easier with stored procs, because you can do something like:

EXEC sys.sp_recompile @objname = N'dbo.AveragePostScore';

To use DBCC FREEPROCCACHE to target a specific query, you need the sql handle or plan handle. You don’t wanna jump off and clear the whole cache here, unless you’re desperate. Just make sure you understand that you might fix one query, and break others, if you clear the whole thing.

It’s better to be targeted when possible.

And of course, if you’ve got Query Store up and running, you may do well to look at Regressed or High Variance query views there, and force the faster plan.

Thanks for reading!

Reasons Your Query Is Slow Right Now: Blocking, Blocking, Blocking

Lockzilla


Blocking sucks. SQL Server should have used an optimistic isolation level by default.

Sure, writers would still block each other, but think of all the time people could have saved not explaining NOLOCK hints.

  • Yes, they still take locks
  • Yes, it’s the same as READ UNCOMMITTED
  • Yes, sometimes they make queries faster without blocking

Wait, what?

Moving Right Along


There are some interesting things to say about blocking, I suppose, aside from that it sucks.

For instance, it can look really weird when there’s parallelism. And read queries can block write queries.

There, I’ve said interesting things about blocking. It still sucks.

But how do you diagnose it?

First, close Activity Monitor.

Doctor, Doctor


My favorite tool for diagnosing blocking is sp_WhoIsActive, and I’m going to show you three different ways to look at it.

The safest way

EXEC sp_WhoIsActive @get_task_info = 2, @get_additional_info = 1;

This isn’t the way most people do it, but it is the least impactful.

You get back the normal set of results:

peachy

Cool, you can see queries, lock waits, and blocking sessions. But a short scroll to the right also brings you to this additional_info clickable column:

oily water

It’s not so helpful for the query doing the blocking, but it’ll tell you what the queries being blocked are stuck on.

The usual way

EXEC sp_WhoIsActive @get_locks = 1;

This is the way I normally demo looking for blocking with it, because it is more useful to see what the blocking query is doing.

But I’ve also had it be slow. Really, really slow.

That can happen when there is JUST SO MUCH BLOCKING that it takes a long time to enumerate all of it.

But you’ll get back this lovely sort of detail:

say it loud

Page locks. 102. Great.

The smart way

EXEC sp_WhoIsActive @find_block_leaders = 1, @sort_order = '[blocked_session_count] DESC';

It’s not always obvious by runtime which session is causing blocking, so you can use this command to find blocking chains, and order output by who’s doing the most blocking.

troubleshot

Fixins


When you need to troubleshoot live blocking, these are the ways I usually check in on things with sp_WhoIsActive. It’s hard to beat.

Of course, long term, a monitoring tool, or logging the blocked process report to a table can help you when you’re not there hitting F5 live and in-person.

We’ve peeled back a couple scenarios where oddball things can slow a server down. Tomorrow we’ll look at a new one!

What’ll it be?

Your guess is not as good as mine.

Thanks for reading!

Underused sp_WhoIsActive capabilities: @delta_interval

Great Ghost


In the last post, we looked at how to see if your query is misbehaving, or if things are running normally.

Note that I didn’t say “good” here. Just normal.

For you.

What Have You Done For Me Lately?


Using sp_WhoIsActive in a slightly different way, we can see what a query has been up to over a duration of our choosing.

The delta columns are what happened over the span of time we pick. The columns that come back normally are cumulative to where the query is at now.

So for a 5 seconds interval:

EXEC sp_WhoIsActive @delta_interval = 5;
Stacked

I’m stacking a couple screens shots from the normal single line of results, because they’re a little too wide all at once.

What deltas can help you figure out is how “active” your query really is.

In other words, it can answer the question: is my query stuck, or is my query doing something?

As an example, here’s a query that opened a transaction, ran an update, and then… sat there.

Over all, we’ve done 4.4 million reads, 80k writes, and used about 7k ms of CPU.

But in the last five seconds, we haven’t done anything at all.

Thanks for reading!

Underused sp_WhoIsActive capabilities: @get_avg_time

Ablazemalls


Look, it’s impressive enough that sp_WhoIsActive works as well as it does. Most people who’ve been using it for a while are familiar with using @get_plans to retrieve query plans, or @get_locks to see locking information. I’m gonna spend a couple posts talking about less well known flags that I really like.

We’re gonna start off with one that can help you figure out if a particular query is misbehaving, or if things are just cruising along normally.

Nipples


I’ve got a stored procedure. What it does is unimportant. What’s important is that it’s sensitive to parameter sniffing.

We may not know this walking into an emergency, but we can figure it out pretty quickly using sp_WhoIsActive.

EXEC sp_WhoIsActive @get_avg_time = 1;
Breathe Easy

This’ll give us back the usual information about how long our query has been running, but now right next to it is another column that ends in (avg).

When you use this switch, sp_WhoIsActive goes into the plan cache and looks at how long a particular query runs for on average.

If that number is much lower than how long the query has been running for, and it isn’t being blocked, you just might have a case of parameter sniffing on your hands.

Thanks for reading!