Running Query Confusion

Somewhat Solved


This bit of confusion is largely solved in SQL Server 2019 under compatibility level 150, when FROID (scalar udf inlining) kicks in.

But, you know, we’re a ways off from 2019 dropping, being adopted, and compat level 150 being the operating norm.

So here goes!

Functional Querying


I’ve got a scalar valued function. What it does is unimportant, but I’m calling it in a query like this:

SELECT u.DisplayName, 
       dbo.TotalScore(u.Id) AS TotalScore --<functione
FROM dbo.Users AS u
WHERE u.Reputation >= 200000
ORDER BY u.Id;

When I run this in SSMS, it’s obvious to us what’s going on.

But if I’m watching what’s happening on a server using sp_WhoIsActive, what’s going on might not be obvious.

I’m doing all this with just my query running to show how confusing things can get.

First Confusion: Query Text

Foggy

This doesn’t look at all like the text of our query. We can guess that it’s the function running in the select list since we know what we’re doing, but, you know…

We can bring some clarity by running sp_WhoIsActive like this:

sp_WhoIsActive @get_plans = 1, 
               @get_outer_command = 1;

The outer command parameter will show us the query calling the function, which’ll look more familiar.

Headline News

Second Confusion: Phantom Parallelism

We’re hitting more of those harmless, silly little CXCONSUMER waits.

But how? Our query plan is serial!

Glam Chowder

This part is a little less obvious, but if we get an estimated plan for our query, or track down the query plan for the function, it becomes more obvious.

Questionable Taco

The query plan for the function is parallel — a cute ~nuance~ about scalar udfs is that they only prevent the query calling them from going parallel.

The function itself can go parallel. So that’s… nice.

I guess.

They Walked Inlined


In compat level 150, things are more clear.

CLRLY

The inner and outer text are the same. There’s more of that CXCONSUMER, though. Hoowee.

Might as well jump.

But at least now we have a query plan that matches the parallel waits, right?

In the next post, we’re gonna talk more about those wait stats, though.

Thanks for reading!

What Are Your Weirdest Waits?

Common People


I know, you probably have questions about CXPACKET.

Maybe you look at a wait stats report and nod sagely at the caps lock extravaganza of variably descriptive names.

The unformatted numbers that you tally on your fingers, and then have to look up what 12 digits is.

When you search around, you can figure out what most of your waits are, and why they happen. You might even be able to figure out how to fix them.

(I know, you’re gonna read advice on a blog from 2003 telling you to set “MaxDOP” to 1, but forchrissakes don’t do it.)

What I’m looking for are waits that pop up, but you haven’t been able to find answers to. The real nut crackers.

If you’ve got those, leave a comment and let me know.

Hey Query, What Were You Waiting On?

Surreal Plans


In a parallel plan with an eager index spool, we can observe EXECSYNC waits for the duration of the spool being built.

In a serial plan, we’re not so lucky. There’s no obvious wait that indicates we built an index.

Let’s run a familiar query:

SELECT      TOP ( 10 )
            u.DisplayName, 
			u.Reputation, 
			ca.*
FROM        dbo.Users AS u
CROSS APPLY 
(   
    SELECT   TOP ( 1 )
			   p.Score
    FROM     dbo.Posts AS p
    WHERE    p.OwnerUserId = u.Id
    AND      p.PostTypeId = 1
    ORDER BY p.Score DESC 
) AS ca
ORDER BY    u.Reputation DESC;

The plan is fully serial:

Meat Lovers

It’s obvious looking at the actual plan in SSMS 18 what took a long time.

If we’re not so lucky, and we have a cached plan, it would be less obvious:

Back Like That

Look how bad scans are! Ha ha ha!

😒

What’s A Wait?


If you wanna figure out the runtime, you have to do some math.

What times what?

If you take the SOS_SCHEDULER_YIELD waits and multiply them by 4ms, you can get about accurate runtime (12,444).

We’re lucky on my laptop that our query isn’t competing with other queries for processor time, so there’s not a long queue to get back on a CPU on each yield.

It’s nice to be able to see this while we’re watching a query, but if we come across it in the plan cache, or if we were looking at wait stats, what would we make of the problem?

Surely an operator that’s only 13% of the plan cost couldn’t be responsible for all that.

😒😒

But There We Have It


A query that runs for a non-trivial amount of time, emits common wait stats, and doesn’t ask for an index when it’s creating one.

And that index creation is what makes up for 99% of the execution time.

Thanks for reading!

😒😒😒

Eager Index Spool Annoyances

Don’t Mask Spools


Certain spools in SQL Server can be counterproductive, though well intentioned.

In this case, I don’t mean that “if the spool weren’t there, the query would be faster”.

I mean that… Well, let’s just go look.

Bad Enough Plan Found


Let’s take this query.

SELECT TOP (50) 
             u.DisplayName, 
             u.Reputation, 
			 ca.*
FROM dbo.Users AS u
CROSS APPLY (
	SELECT TOP (10) 
	            p.Id, 
	            p.Score, 
				p.Title
	FROM dbo.Posts AS p
	WHERE p.OwnerUserId = u.Id
	AND   p.PostTypeId = 1
	ORDER BY p.Score DESC
			) AS ca
ORDER BY u.Reputation DESC;

Top N per group is a common enough need.

If it’s not, don’t tell Itzik. He’ll be heartbroken.

The query plan looks like this:

Wig Billy

Thanks to the new operator times in SSMS 18, we can see exactly where the chokepoint in this query is.

Building and reading from the eager index spool takes 70 wall clock seconds. Remember that in row mode plans, operator times aggregate across branches, so the 10 seconds on the clustered index scan is included in the index spool time.

One thing I want to point out is that even though the plan says it’s parallel, the spool is built single threaded.

One Sided

Reading data from the clustered index on the Posts table and putting it into the index is all run on Thread 2.

If we look at the wait stats generated by this query, a full 242 seconds are spent on EXECSYNC.

Armless

The math mostly works out, because four threads are waiting on the spool to be built.

Even though the scan of the clustered index is serial, reading from the spool occurs in parallel.

Spange

Connected


Eager index spools are built per-query, and discarded afterwards. When built for large tables, they can represent quite a bit of work.

In this example query, a 17 million row index is built, and that’ll happen every single time the query executes.

While I’m all on board with the intent behind the index spool, the execution is pretty brutal. Much of query tuning is situational, but I’ll always pay attention to an index spool (especially because you won’t get a missing index request for them anywhere). You’ll wanna look at the spool definition, and potentially create a permanent index to address the issue.

As for EXECSYNC waits, they can be generated by other things, too. If you’re seeing a lot of them, I’m willing to bet you’ll also find parallel queries with spools in them.

Thanks for reading!

The Waiting Life Of A Query

Smoking Ain’t Allowed In School


While working on my new training, part of what I want to show people is how to match what happens in a query plan to the type of wait stats they’ll see on their servers.

Take this query plan for example:

Nose Boop™

There are THINGS GOING ON HERE! Scans, parallelism, sorting, spilling.

What do the waits look like for this query?

Spy vs Spy


It’s fairly easy to focus in on what a session is doing using sys.dm_exec_session_wait_stats, though there are some caveats about when the numbers reset.

On my laptop, I can just open a new SSMS tab without worrying about connection pooling or any weird app stuff.

When I do, these are the waits I see while the query runs.

Every single one of those waits is accounted for by something that happened in the query plan (or its attributes).

Breakdown


Our pages weren’t in memory, so we read them from disk (PAGEIOLATCH_SH), we had to allocate buffer pool memory (MEMORY_ALLOCATION_EXT), we used parallelism (both CX waits), we needed memory to sort data (RESERVED_MEMORY_ALLOCATION_EXT), we didn’t get enough so we spilled (IO_COMPLETION), and we returned data to SSMS (ASYNC_NETWORK_IO).

LATCH_EX can be a lot of things. In a parallel plan, it can be associated with exchange buffers and the parallel page supplier.

The goal here is for you to both look at a server’s wait stats and come up with a mental image of the types of queries you’re going to see, and be able to tie query plan operators and attributes back to wait stats.

Thanks for reading!

Stop Looking At Wait Stats Without Looking At Uptime

Economy Of Waits


There’s a joke about two economists walking down the street.

One of them asks the other how they’re doing.

The punchline is that their response is “compared to what?”

It’s not the best joke, and it’s something to keep in mind when you’re measuring anything, but SQL Server specifically.

This isn’t a post about collecting baselines, though it’s a relevant concept.

Scenery, Yo


One of the best ways to find bottlenecks in SQL Server is to look at wait stats.

Lots of scripts and monitoring tools will show you top waits, percentages, signal waits, and even percentages of signal waits.

Oh baby, those datapoints.

But there’s frequently a missing axis: compared to what?

Weakly Links


Let’s say you’ve got 604,800 seconds of CX packet waits.

Let’s also say they’re 95% of your total server wait stats.

How does your opinion of that number change if your server has been up for:

  • One Day (86,400 seconds)
  • One Week (604,800 seconds)
  • One Month (2,592,000 seconds)
  • One Year (31,536,000 seconds)

Obviously, if your server has been up for a day, you might wanna pay more attention to that metric.

If your server has been up for two weeks, it becomes less of an issue.

Seven Year Abs


I’ll give you another example: OH MY GOD YOU ATE 20,000 CALORIES.

  • In a day, that might be cause for concern
  • In a week, you’re about average
  • In a month, you might need medical attention
  • In a year, well, you’re probably more calorically important to worms

Compared to what is a pretty important measure.

Forced Perspective


I get it. Someone can clear out wait stats, and judging uptime can be unreliable, and more difficult up in the cloud.

Looking at wait stats without knowing the period of time they were collected over isn’t terribly helpful.

I’d opened an issue to at least separate wait stats by database, though Microsoft doesn’t seem to be too into my idea.

Thanks for reading!

Troubleshooting THREADPOOL With The Plan Cache

You Come And Go


When THREADPOOL strikes, even the best monitoring tools can have a bunch of blank spots hanging around in them.

If you’re on SQL Server 2016 or better, there are some helpful columns in sys.dm_exec_query_stats.

WITH threads
    AS
     (
         SELECT   TOP (10)
                  deqs.sql_handle,
                  deqs.plan_handle,
                  deqs.total_reserved_threads,
                  deqs.last_reserved_threads,
                  deqs.min_reserved_threads,
                  deqs.max_reserved_threads,
                  deqs.total_used_threads,
                  deqs.last_used_threads,
                  deqs.min_used_threads,
                  deqs.max_used_threads,
                  deqs.execution_count
         FROM     sys.dm_exec_query_stats AS deqs
         WHERE    deqs.min_reserved_threads > 0
         ORDER BY deqs.max_reserved_threads DESC
     )
SELECT      t.execution_count,
            t.total_reserved_threads,
            t.last_reserved_threads,
            t.min_reserved_threads,
            t.max_reserved_threads,
            t.total_used_threads,
            t.last_used_threads,
            t.min_used_threads,
            t.max_used_threads,
            CASE WHEN (t.min_reserved_threads * 2) < t.max_reserved_threads 
                 THEN 'maybe'
                 ELSE 'maybe not'
            END AS [sniffy?],
            d.query_plan
FROM        threads AS t
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS d
ORDER BY    t.execution_count DESC, t.max_used_threads DESC;

On My Laptop


Which doesn’t have anything terribly interesting on it at the moment, I get this back.

Egg salad

I have my results weighted towards what’s executed the most, then by how many threads they wanted to reserve.

This could give you a good idea about which queries contributed to THREADPOOL problems.

Keep in mind that, regardless of how many threads the queries end up using, they’ll reserve as many as they want (unless DOP is downgraded due to CPU pressure).

If they get blocked, they hold onto those threads.

If those threads aren’t available when they start running, they’ll either wait for them to show up, or run at a lower DOP.

What Should You Do Here?


  • Take a look at the query plans: Are there any home dunk missing indexes?
  • Take a look at the query text: Is there some obvious problem, like a non-SARGable predicate that you can fix?
  • Take a look at your server: Are MAXDOP and CTFP set correctly?
  • Take a look at the predicates: Is there some bad parameter sniffing going on (check the sniffy column for swings in threads)

Thanks for reading!