Join Me At Data Platform Summit 2020!

The Road From Nowhere


This year, I’m teaching an 8 hour online workshop at Data Platform Summit, and I’d love it if you joined me.

Here’s what I’ll be teaching:

Class Title: The Beginner’s Guide To Advanced Performance Tuning

Abstract: You’re new to SQL Server, and your job more and more is to fix performance problems, but you don’t know where to start.

You’ve been looking at queries, and query plans, and puzzling over indexes for a year or two, but it’s still not making a lot of sense.

Beyond that, you’re not even sure how to measure if your changes are working or even the right thing to do.

In this full day performance tuning extravaganza, you’ll learn about all the most common anti-patterns in T-SQL querying and indexing, and how to spot them using execution plans. You’ll also leave knowing why they cause the problems that they do, and how you can solve them quickly and painlessly.

If you want to gain the knowledge and confidence to tune queries so they’ll never be slow again, this is the training you need.

Date: Dec 7 & 8.

Time: 12 PM to 04 PM EST (View in your timezone)

Tickets: Tickets here!

Please Stop Filtering CXCONSUMER Out Of Actual Execution Plans

I’m Begging You, Here


Ever since SSMS started collecting wait stats in query plans, I’ve been mad about a couple things that get filtered out:

  • Lock waits
  • CXCONSUMER waits

Lock waits are particularly annoying. Imagine (I know, this might be difficult) that you have a friend who is puzzled by why a query is sometimes slow.

They send you an actual plan for when it’s fast, and an actual plan for when it’s slow. You compare them in every which way, and everything except duration is identical.

It’d be a whole lot easier to answer them if LCK waits were collected, but hey. Let’s just make them jump through another hoop to figure out what’s going on.

CXCONSUMER has a similar problem — and here’s the thing — if people are going through the trouble of collecting this information, give’em what they ask for. Don’t just give them what you think is a good idea.

Highly Affected People


Let’s look at a query where parallelism all gets skewed to one thread.

SELECT u.Id, 
       u.DisplayName, 
       u.Reputation, 
       u.CreationDate, 
       ca.*
FROM dbo.Users AS u
OUTER APPLY
    (
        SELECT *, 
        DENSE_RANK() 
              OVER( PARTITION BY vs.Id 
                    ORDER BY vs.Id DESC ) AS whatever
        FROM dbo.VotesSkewed AS vs
        WHERE vs.UserId = u.Id
        AND   vs.VoteTypeId BETWEEN 1 AND 4
    ) AS ca
WHERE ca.whatever = 0;

It runs for 42 seconds!

slowpoke

But the only wait collected is SOS_SCHEDULER_YIELD. For 392 ms.

o okay

Different Angle


If we watch the server’s wait stats while the query runs, we see a totally different story.

EXEC sp_BlitzFirst @Seconds = 50, @ExpertMode = 1;
uh huh

We had four waits on CXCONSUMER that all lasted nearly 11 seconds. Which of course happens because all the rows end up on one thread.

bumpin

Sure, that’s easy enough to spot if you go looking, but having CXCONSUMER waits collected would make it a whole lot easier to know what to look for.

That is surely a significant enough wait to include. And don’t tell me it’s not actionable, because I spend a good chunk of time fixing problems like this.

Thanks for reading!

Reasons Your Query Is Slow Right Now: Blocking, Blocking, Blocking

Lockzilla


Blocking sucks. SQL Server should have used an optimistic isolation level by default.

Sure, writers would still block each other, but think of all the time people could have saved not explaining NOLOCK hints.

  • Yes, they still take locks
  • Yes, it’s the same as READ UNCOMMITTED
  • Yes, sometimes they make queries faster without blocking

Wait, what?

Moving Right Along


There are some interesting things to say about blocking, I suppose, aside from that it sucks.

For instance, it can look really weird when there’s parallelism. And read queries can block write queries.

There, I’ve said interesting things about blocking. It still sucks.

But how do you diagnose it?

First, close Activity Monitor.

Doctor, Doctor


My favorite tool for diagnosing blocking is sp_WhoIsActive, and I’m going to show you three different ways to look at it.

The safest way

EXEC sp_WhoIsActive @get_task_info = 2, @get_additional_info = 1;

This isn’t the way most people do it, but it is the least impactful.

You get back the normal set of results:

peachy

Cool, you can see queries, lock waits, and blocking sessions. But a short scroll to the right also brings you to this additional_info clickable column:

oily water

It’s not so helpful for the query doing the blocking, but it’ll tell you what the queries being blocked are stuck on.

The usual way

EXEC sp_WhoIsActive @get_locks = 1;

This is the way I normally demo looking for blocking with it, because it is more useful to see what the blocking query is doing.

But I’ve also had it be slow. Really, really slow.

That can happen when there is JUST SO MUCH BLOCKING that it takes a long time to enumerate all of it.

But you’ll get back this lovely sort of detail:

say it loud

Page locks. 102. Great.

The smart way

EXEC sp_WhoIsActive @find_block_leaders = 1, @sort_order = '[blocked_session_count] DESC';

It’s not always obvious by runtime which session is causing blocking, so you can use this command to find blocking chains, and order output by who’s doing the most blocking.

troubleshot

Fixins


When you need to troubleshoot live blocking, these are the ways I usually check in on things with sp_WhoIsActive. It’s hard to beat.

Of course, long term, a monitoring tool, or logging the blocked process report to a table can help you when you’re not there hitting F5 live and in-person.

We’ve peeled back a couple scenarios where oddball things can slow a server down. Tomorrow we’ll look at a new one!

What’ll it be?

Your guess is not as good as mine.

Thanks for reading!

Reasons Your Query Is Slow Right Now: Your Server Is Overloaded

Oh Baby Another Series


Don’t you love lists? I love lists. They’re so nice and tidy, like a RegEx. Just the other night I was thinking how nice it’d be to write a RegeEx to detect RegEx patterns.

I didn’t sleep.

Anyway, SQL Server. Things. Things running slowly. What’s up with that?

In these posts, I’ll walk through some common scenarios, and how to look for them.

It might even be in the EXACT SAME VERY ORDER I LOOK FOR THEM!

Put your money away.

Reason 1: The Server Is Overloaded


If you go to open a new tab in SSMS and you hear a creaking sound, it might be time to revisit some of your hardware choices.

But that’s a problem for future you. Your problem is right now.

  • How to check it: sp_PressureDetector
  • What to look for: Stuck queries, Resource Usage, Wait Stats

By default, it’ll look at both CPU and Memory counters. If you don’t know what’s going on, just hit F5.

EXEC dbo.sp_PressureDetector;

If you’re hitting memory limits, things will look like this:


 

are you ready?

Some queries will have requested memory, but it won’t have been granted.

payday

Waiting queries will be stuck in a queue, waiting on RESOURCE_SEMAPHORE.

we’re having a good time

A significant amount of memory will have been granted out, and available memory will be on the lower side. You’ll also probably see the waiter_count column

If you’re hitting CPU limits, things will look like this:


 

negative creep

Available threads might be a negative, or very low number. Requests may be waiting on threads, and the number will pretty closely match the number of rows that are…

bum

Waiting in the pool.

she works hard for the money

This’ll also show you queries that are running, and which ones are using the most threads.

Can You Fix It Quickly?


Maybe, maybe not. If you’re allowed to kill off queries, you might be able to right the ship now. If not, you’re stuck waiting for queries to finish and give back their resources.

Longer term, you have a long checklist to go through, including asking tough questions about hardware, settings memory and parallelism settings, and your queries and indexes. This script will give you a lot of information about what’s going on. It’s up to you to figure out why.

If you need help with this sort of thing, drop me a line.

Thanks for reading!

SQL Server Needs Performance Views

No, Not More DMVs


Though I would be cool with new ones, as long as they’re not memes.

When you’re trying to gauge some high level performance metrics on a new server, you probably have your favorite scripts.

They could be various Blitzes, they could be some stuff you’ve had sitting in your script folder, maybe they’re from the Tiger Toolbox.

Whatever.

The point is that you, dear reader, are smart and engaged enough to know about and use these things.

A lot of people aren’t.

I’m not talking about another thing to go find and install. I mean these should come with the product.

Perf Schema


It would be really cool if SQL Server had a system schema called perf. In there you could have views to all sorts of neat things.

It would exist in every database, and it would have views in it to fully assemble the mess of DMVs that accompany:

  • Query Store
  • Plan Cache
  • Index Usage
  • Missing Indexes
  • File Stats
  • Wait Stats
  • Locking
  • Deadlocks

Assembling all those views is painful for beginners (heck Query Store is painful for everyone). Worse, they may find scripts on the internet that are wrong or outdated (meaning they may not have new columns, or they may give outdated advice on things).

What would make these particularly helpful is that they could aggregate metrics at the database level. Server-wide counters are cool until your server is really wide, and it’s impossible to tell where stuff like wait stats are coming from. This wouldn’t be too difficult to implement, since Azure SQLDB already has to have a bunch of self-contained stuff, due to the lack of cross-database queries.

Best of all, Microsoft can keep them up to date based on which version and edition of SQL Server you’re on, and if certain changes get back ported.

Thanks for reading!

SQL Server 2019: WAIT_ON_SYNC_STATISTICS_REFRESH Redux

Here’s Another One


In the first post, I looked at a relatively large table. 50 million rows is a decent size.

But 50 million row tables might not be the target audience for this wait.

So, we’re gonna go with a >2 billion row table. Yes, dear reader, this table would break your PUNY INTEGER limits.

Slightly different


The full setup scripts are pretty long, but I’ll show the basic idea here.

Because this table is going to be fairly large, I’m gonna use clustered column store for maximum compressions.

USE StackOverflow2013;
GO

DROP TABLE IF EXISTS dbo.Vetos;
GO

CREATE TABLE dbo.Vetos
(
    Id INT NOT NULL,
    PostId INT NOT NULL,
    UserId INT NULL,
    BountyAmount INT NULL,
    VoteTypeId INT NOT NULL,
    CreationDate DATETIME NOT NULL,
    INDEX c CLUSTERED COLUMNSTORE
);


INSERT INTO dbo.Vetos WITH(TABLOCKX)
SELECT ISNULL(v.Id, 0) AS Id,
       v.PostId,
       v.UserId,
       v.BountyAmount,
       v.VoteTypeId,
       v.CreationDate
FROM
(
SELECT * FROM dbo.Votes
UNION ALL

-- I'm snipping 18 union alls here

SELECT * FROM dbo.Votes
) AS v;

The first test is just with a single statistics object.

CREATE STATISTICS s_UserId ON dbo.Vetos (UserId);

Fork In The Road


Since every sane person in the world knows that updating column store indexes is a donkey, I’m switching to an insert to tick the modification counter up.

INSERT INTO dbo.Vetos WITH(TABLOCKX)
SELECT ISNULL(v.Id, 0) AS Id,
       v.PostId,
       v.UserId,
       v.BountyAmount,
       v.VoteTypeId,
       v.CreationDate
FROM
(
SELECT * FROM dbo.Votes
UNION ALL
SELECT * FROM dbo.Votes
UNION ALL
SELECT * FROM dbo.Votes
UNION ALL
SELECT * FROM dbo.Votes
UNION ALL
SELECT * FROM dbo.Votes
) AS v;

Query Time


To test the timing out, I can use a pretty simple query that hits the UserId column:

SELECT COUNT_BIG(*)
FROM dbo.Vetos
WHERE UserId = 138
AND 1 = (SELECT 1);

The query runs for ~3 seconds, and…

Takeover

We spent most of that three seconds waiting on the stats refresh.

I know, you’re looking at those parallelism waits.

But what if the stats update went parallel? I’ll come back to this in another post.

Query Times Two


If you’re thinking that I could test this further by adding more stats objects to the UserId column you’d be dreadfully wrong.

SQL Server will only update one stats object per column. What’s the sense in updating a bunch of identical stats objects? I’ll talk about this more in another post, too.

If I reload the table, and create more stats objects on different columns, though…

CREATE STATISTICS s_UserId ON dbo.Vetos (UserId);
CREATE STATISTICS s_PostId ON dbo.Vetos (PostId);
CREATE STATISTICS s_BountyAmount ON dbo.Vetos (BountyAmount);
CREATE STATISTICS s_VoteTypeId ON dbo.Vetos (VoteTypeId);
CREATE STATISTICS s_CreationDate ON dbo.Vetos (CreationDate);

And then write a bigger query after inserting more data to tick up modification counters…

SELECT COUNT(*)
FROM dbo.Vetos
WHERE UserId = 138
AND   PostId = 138
AND   BountyAmount = 138
AND   VoteTypeId = 138
AND   CreationDate = 138;
Dangalang

This query runs for 14 seconds, and all of it is spent in the stats update.

Bigger, Badder


Alright, prepare to be blown away: things that are fast against 50 million rows are slower against 2 billion rows.

That include automatic stats updates.

So yeah, if you’re up in the billion row range, automatic stats creation and updates might just start to hurt.

If you move to SQL Server 2019, you’ll have some evidence for when refreshes take a long time, but still nothing for when the initial creation takes a long time.

Thanks for reading!

What is the PVS_PREALLOCATE wait type?

I was workload testing on SQL Server 2019 RC1 when I ran into a wait type I’d never noticed before: PVS_PREALLOCATE. Wait seconds per second was about 2.5 which was pretty high for this workload. Based on the name it sounded harmless but I wanted to look into it more to verify that.

The first thing that I noticed was that total signal wait time was suspiciously low at 12 ms. That’s pretty small compared to 55000000 ms of resource wait time and suggested a low number of wait events. During testing we log the output of sys.dm_os_wait_stats every ten seconds so it was easy to graph the deltas for wait events and wait time for PVS_PREALLOCATE during the workload’s active period:

This is a combo chart with the y-axis for the delta of waiting tasks on the left and the y-axis for the delta of wait time in ms on the right. I excluded rows for which the total wait time of PVS_PREALLOCATE didn’t change. As you can see, there aren’t a lot of wait events in total and SQL Server often goes dozens of minutes, or sometimes several hours, before a new wait is logged to the DMV.

This pattern looked like a single worker that was almost always in a waiting state. To get more evidence for that I tried comparing the difference in logging time with the difference in wait time. Here are the results:

Everything matches within a margin of error of 10 seconds. Wait stats are logged every 10 seconds so everything fits. The data looks exactly as it should if a single session was almost always waiting on PVS_PREALLOCATE. I was able to find said session:

I did some more testing on another server and found that all waits were indeed tied to a single internal session id. The PVS_PREALLOCATOR process starts up along with the SQL Server service and has a wait type of PVS_PREALLOCATE until it wakes up and has something to do. Blogging friend Forrest found this quote about ADR:

The off-row PVS leverages the table infrastructure to simplify storing and accessing versions but is highly optimized for concurrent inserts. The accessors required to read or write to this table are cached and partitioned per core, while inserts are logged in a non-transactional manner (logged as redo-only operations) to avoid instantiating additional transactions. Threads running in parallel can insert rows into different sets of pages to eliminate contention. Finally, space is pre-allocated to avoid having to perform allocations as part of generating a version.

That’s good enough for me. This wait type appears to be benign from a waits stats analysis point of view and I recommend filtering it out from your queries used to do wait stats analysis. Thanks for reading!

SQL Server 2019: WAIT_ON_SYNC_STATISTICS_REFRESH

Well Have You Ever


I mean ever really wondered just how long a query of yours waited on stats to automatically update before running?

Have you ever been absolutely, positively sure that those gosh dang darn stats updates were putting your query on the trolley to Slowsville?

Your wishes will be 100% granted in SQL Server 2019.

Now, this wait doesn’t show up when stats are created the first time.

So if you run a query with a where clause on a column that doesn’t already have stats, the initial gather won’t show.

This is only for updates. Not creates.

Got it?

In Action


The following script will take the Votes table, and make a copy of it called Vetos.

Then it’ll create a PK/CX (for some reason?), and run a query against a couple columns that are mostly NULL.

Voting data gets cleaned out of the SO data dump.

SELECT ISNULL(Id, 0) AS Id,
       PostId,
       UserId,
       BountyAmount,
       VoteTypeId,
       CreationDate
INTO dbo.Vetos
FROM dbo.Votes;

ALTER TABLE dbo.Vetos
 ADD CONSTRAINT PK_Vetos_Id 
    PRIMARY KEY CLUSTERED(Id);

SELECT TOP 10 * 
FROM dbo.Vetos 
WHERE UserId > 0 
AND BountyAmount > 0;

The last query is important because it generates the initial stats on both of those columns.

Now let’s put some work into it!

UPDATE v
SET v.BountyAmount = 50000
FROM dbo.Vetos AS v
WHERE v.BountyAmount IS NULL;

UPDATE v
SET v.UserId = v.VoteTypeId
FROM dbo.Vetos AS v
WHERE v.UserId IS NULL;

This table has 52,928,720 rows in it. Not the biggest, but a decent size to maybe have to wait on stats to update.

Ready Steady


In separate windows, I’ll run these:

SELECT COUNT(*) AS records
FROM dbo.Vetos AS v
WHERE v.BountyAmount > 500;

SELECT COUNT(*) AS records
FROM dbo.Vetos AS v
WHERE v.UserId < 16;

They’ll trigger the stats refresh.

Fun. Yes.

Checking in on each session’s wait stats using dm_exec_session_wait_stats, our wild wait appears.

I thought you were dead.

So there you have it. 52 million row stats refreshes take about half a second.

That wasn’t very exciting. Let’s try something else.

Tricks, Kids


If we start from scratch, but instead of letting SQL Server create stats automatically by running a query, let’s create statistics with some funny options, and then update the columns.

CREATE STATISTICS s_b ON dbo.Vetos(BountyAmount) 
WITH FULLSCAN, PERSIST_SAMPLE_PERCENT = ON;

CREATE STATISTICS s_u ON dbo.Vetos(UserId) 
WITH FULLSCAN, PERSIST_SAMPLE_PERCENT = ON;

Now, when we run our select queries, things really slow down.

Sunny

Quite a bit longer on those.

Actionable?


What would one do if they encountered 15-16 waits on this in real life?

Well, you have some options.

  • Update stats asynchronously
  • Create stats with no recompute and handle stats updates yourself
  • Update statistics more often than you currently do, trying to stay ahead of automatic updates

It’s hard to see this being a really big issue outside of very large tables, and perhaps only on under-powered servers.

Or if someone created statistics with some rather impolite settings.

Thanks for reading!

When Are Wait Stats Actually Useful?

The People Who Stare At Plans


I’m going to be honest with you, dear readers. I’ve fallen out of like with wait stats.

Unless there’s a massive resource or blocking problem, they’re a bit bland.

Tuning a single query, I don’t look at them at all.

Sure, they can be a good detail when you’re painting an overall picture of a server, but they’re misleading a lot.

They could be more useful, if:

  • You knew when they happened
  • You knew which queries caused them
  • You knew which queries were impacted by them

Let’s say I look at your server. Let’s say I run any script that analyzes wait stats.

Even Your Worst


Holy cow, you sure have a lot of (THREADPOOL, RESOURCE_SEMAPHORE) waits here, pal.

You got a big problem with hardware.

Well, maybe.

Let’s say they happen at night, when no one’s around.

We could sit there all day hitting F5 and never see a sign of the problem.

It sure would be nice to know when they happened.

It Was Really Something


Not everyone’s server is a 24/7 performance nightmare.

Lots of people have servers that get steadily used from 9-5, but don’t really have spikes normally.

Except at the beginning or end of the month/year/quarter.

If you look at wait stats in general, the server might look very bored most of the time.

All those sums and averages smooth out over time.

Most people don’t always have a heart rate of 180 bpm, but they might for the 5 minutes they have a heart attack.

That’s sort of important.

The Sanest Days Are Mad


What if we see a lot of long lock waits?

I bet users are really mad about those.

Have you heard about our Lord and Savior, RCSI?

You’re definitely gonna need that.

Which database?

Oh, that logging database that users never query.

Darn.

Again, knowing which queries were on the giving and receiving end of that would be stellar.

Toolio, My Toolio


It’s not that wait stats themselves are dull, it’s the way we’re forced to look at them today.

You either have to hit F5 at the magic moment to catch them, log them on your own, or buck up for a monitoring tool.

It’s a matter of how and what, and when data gets logged for them, not a matter of tool quality.

I’d love to see a more complete picture of these things when trying to diagnose or troubleshoot issues.

Thanks for reading!

Too Much Of A Harmless Wait Can Be Harmful

Absence Of Evidence


There’s a pinky-out saying about wine: don’t hate the grape.

People say it because the same kind of grape can be grown by different people in different places.

Due to many, ahem, local factors, wine made with that grape can taste miraculously different.

It’s with that in mind that I’m going to say this: don’t ignore the wait.

No matter what script you’re using to look at wait stats, try unquoting the ignoreable list and seeing what shows up.

Get curious. Poke around. You might find something interesting.

Twosifer


While experimenting with FROID, I came up with a function and query that generate some weird waits.

Those waits are EXECSYNC, and CXCONSUMER. Now, under normal circumstances, you might be able to ignore them.

But things are rarely normal when you’re experiencing performance problems, are they? If you ignore too much, you can miss big problems.

Going back to running this query, I can see the wait stats that get generated in sys.dm_exec_session_wait_stats when the query is finished.

SELECT u.DisplayName, 
       dbo.TotalScore(u.Id) AS TotalScore
FROM dbo.Users AS u
WHERE u.Reputation >= 200000
ORDER BY u.Id;

Here’s what those waits look like:

Better Off Dead

If one were to follow advice — even advice from Microsoft — one may miss important clues as to what happened.

CXCONSUMER waits being high is fairly tightly correlated to skewed parallelism, and this is no exception.

EXECSYNC represents a serial zone within a parallel plan, in this case building two Eager Index Spools:

SPOOL-TA-TOES

When you spend a long time building indexes single threaded, you spend a long time waiting on CXCONSUMER (and not so much time at all waiting on CXPACKET).

Being able to put the waits together with the query plan can help you tune queries more efficiently.

This is especially true if you’re on earlier versions of SQL Server/SSMS where the kind of detail shown in query plans here doesn’t exist.

Thanks for reading!