In SQL Server Lock Waits Come From Blocking, Not Locks Being Taken By Queries

Waiting Game

In SQL Server, troubleshooting blocking problems is a pain. It’s one of those things you really have to be monitoring for pretty actively in order to catch the full extent of it.

You might catch glimpses of it in sp_WhoIsActive or sp_BlitzWho, but it’s really hard to get the full picture and blocking chain for the duration of the event.

This is much easier done via dedicated monitoring tools, where the GUI will often give you a tree view of the blocking.

Does Query Store Help?

In SQL Server 2017, we got some high level wait stats associated with queries.

These are generally more helpful to identify which queries were being blocked, but still not who was doing the blocking.

Regular wait stats (i.e. from sys.dm_os_wait_stats) have no association to the queries that caused them.

If you’re on SQL Server 2017 or better, and you’re using Query Store, you can get a look at those waits with a query like this:

SELECT   qsws.wait_category_desc,
         SUM(qsws.total_query_wait_time_ms) / 1000. AS total_query_wait_time_s,
         AVG(qsws.avg_query_wait_time_ms) / 1000. AS avg_query_wait_time_s,
         MAX(qsws.max_query_wait_time_ms) / 1000. AS max_query_wait_time_s
FROM     sys.query_store_wait_stats AS qsws
GROUP BY qsws.wait_category_desc
ORDER BY total_query_wait_time_s DESC;

The view of your wait stats is far less detailed, but at least it’s only this one database. Look, at least you get that.

With that out of the way, let’s take that simple query and make a couple minor adjustments to get some other information out.

WITH qs_waits AS (
SELECT   qsws.wait_category_desc,
         qsws.plan_id,
         SUM(qsws.total_query_wait_time_ms) / 1000. AS total_query_wait_time_s,
         AVG(qsws.avg_query_wait_time_ms) / 1000. AS avg_query_wait_time_s,
         MAX(qsws.max_query_wait_time_ms) / 1000. AS max_query_wait_time_s
FROM     sys.query_store_wait_stats AS qsws
WHERE qsws.wait_category_desc = 'Lock'
GROUP BY qsws.wait_category_desc, qsws.plan_id
)
SELECT qsw.*,
       r.*,
       p.query_sql_text,
       TRY_CONVERT(XML, p.query_plan) AS query_plan
FROM qs_waits AS qsw
OUTER APPLY
(
    SELECT TOP (1) qsp.plan_id,
                   qsp.query_plan,
                   qsqt.query_sql_text
    FROM sys.query_store_plan AS qsp
    JOIN sys.query_store_query AS qsq
        ON qsq.query_id = qsp.query_id
    JOIN sys.query_store_query_text AS qsqt
        ON qsq.query_text_id = qsqt.query_text_id
    WHERE qsw.plan_id = qsp.plan_id
    ORDER BY qsp.last_execution_time DESC
) AS p
OUTER APPLY
(
    SELECT TOP (1) qsrs.avg_duration / 1000. AS avg_duration, 
                   qsrs.avg_cpu_time / 1000. AS avg_cpu_time
    FROM sys.query_store_runtime_stats AS qsrs
    WHERE qsrs.plan_id = p.plan_id
    ORDER BY qsrs.last_execution_time DESC
) AS r;

There we go.

Nice, clean, simple, but most important totally intuitive. I love how easy it is to quickly get the information you want.

Lookin’ For Locks

That query will get information about queries which waited on Locks — that’s why we’re getting plan_id up in our CTE.

After that, we’re getting the plan and text of any queries that waited on locks, and a couple vanity metrics.

Only one query waited on locks. Make a couple notes here, though:

avg_query_wait_time is 21 seconds
avg_query_duration is 21 seconds
avg_cpu_time is 0.12 seconds

My workload is Highly Contrived™ so the avg wait and query duration line up. In real life, you probably won’t have queries that were only ever waiting on locks to be released.

But it’s worth making some comparisons like this when you’re having blocking problems, especially when you tie wait times and durations in with cpu time.

If cpu is very low but duration is high, generally, you’ve found blocked queries.

Looking at the text and plan, we can also reasonably surmise that this read query wasn’t blocking anything.

But Still…

This doesn’t tell us what query was blocking the select. If you want to figure that out, you’ve got some options:

Log sp_WhoIsActive to a table on a schedule
Use sp_HumanEvents to monitor for blocking
Go buy a monitoring tool

Sure, you could also go digging through DMVs to find modification queries that last ran around the same time, but I’ve hit a lot of dead ends there.

A lot of this pain would go away if SQL Server were optimistic by default, or if you’re allowed to enable an optimistic isolation level.

And remember: Queries taking locks don’t register locking waits. Only queries being blocked register locking waits.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

3 thoughts on “In SQL Server Lock Waits Come From Blocking, Not Locks Being Taken By Queries”

happydba says:

May 28, 2020 at 11:25 am

Thanks for the discount on the course! Excited to watch it!
1. Erik Darling says:
  
  May 28, 2020 at 11:27 am
  
  My pleasure! Always like to reward people who read until the end ?
Pingback: Lock Waits are Really Blocking Waits – Curated SQL

Comments are closed.