Starting SQL: Waits And Measures

Done Started Something


If you talk to anyone who bills by the hour for SQL Server, one of the first things they’ll ask you is what your wait stats are.

It’s not a bad question; it kills some time, and it lets them gauge how engaged you are with your server.

  • Waits can sometimes indicate a severe bottleneck
  • Waits can also be “high” on a server that has perfectly acceptable throughput

There are also some problems with wait stats, and performance counters in general: there’s no direct correlation between the counters and the queries that caused them.

Sure, Query Store in 2017+ tracks high level wait stats for queries, but I don’t run into a lot of people using Query Store still.

The other trouble with performance counters is that they just aggregate over time, with no partitioning either by database or time.

Brooklyn Basement


Some things to keep in mind when you’re looking at wait stats:

  • How they compare to server uptime
  • How many times the wait occurred
  • How long each occurrence took to complete on average

If your script or monitoring tool doesn’t report this stuff, it’s leaving some really important stuff out of the picture.

The relationship to server up-time is important, because if all you have to work off of is percentages, you have no idea if your server is constantly grinding on these waits, or if they only happened for a few minutes.

As a thought experiment: say 100% of your waits are on PAGEIOLATCH_SH, and you have 2 hours total of those waits.

How does your opinion of those waits change if the server has been up for:

  • 1 hour
  • 4 hours
  • 24 hours
  • 96 hours

Typically, the longer a server has been up, and the less concentrated waits become, the less of a focal point they should be. There’s no magic ratio here, but for me, the closer wait times are to up time, the more I may want to look at them.

Longs and Shorts


Yes, there can be a mix of these, but there’s still two basic ways that wait stats rack up.

  • Lots of waits that happen pretty quickly
  • Fewer waits that last a long time

Obviously, the source of those waits will impact how much we care about them. Here are some common examples:

  • You wait often on storage, but the waits are short (say <10 ms)
  • You don’t wait often on locks, but the waits are long (say > 10,000 ms)

It’s pretty trivial to look at your wait stats

I’m not gonna reinvent any wheels here, but I will reiterate a couple things:

  • You can see which queries are generating waits while¬† they’re running, but…
  • If you’re not constantly logging that, it disappears on you
  • Once waits are completed, they are faceless and blameless
  • Waits in general can be far less valuable when workloads are inconsistent

Attentione!


Tomorrow, I’ll talk through some of the most common waits, and first-line ways to start solving them.

They may not solve every single problem, but they’re generally a good place to start.

Thanks for reading!

For the entire month of August, you can get 75% off my recorded video training with the coupon code “cruelsummer”



Leave a Reply

Your email address will not be published. Required fields are marked *