Lock Waits Come From Blocking, Not Locks Being Taken

Waiting Game


In SQL Server, troubleshooting blocking problems is a pain. It’s one of those things you really have to be monitoring for pretty actively in order to catch the full extent of it.

You might catch glimpses of it in sp_WhoIsActive or sp_BlitzWho, but it’s really hard to get the full picture and blocking chain for the duration of the event.

This is much easier done via dedicated monitoring tools, where the GUI will often give you a tree view of the blocking.

Does Query Store Help?


In SQL Server 2017, we got some high level wait stats associated with queries.

These are generally more helpful to identify which queries were being blocked, but still not who was doing the blocking.

Regular wait stats (i.e. from sys.dm_os_wait_stats) have no association to the queries that caused them.

If you’re on SQL Server 2017 or better, and you’re using Query Store, you can get a look at those waits with a query like this:

SELECT   qsws.wait_category_desc,
         SUM(qsws.total_query_wait_time_ms) / 1000. AS total_query_wait_time_s,
         AVG(qsws.avg_query_wait_time_ms) / 1000. AS avg_query_wait_time_s,
         MAX(qsws.max_query_wait_time_ms) / 1000. AS max_query_wait_time_s
FROM     sys.query_store_wait_stats AS qsws
GROUP BY qsws.wait_category_desc
ORDER BY total_query_wait_time_s DESC;

The view of your wait stats is far less detailed, but at least it’s only this one database. Look, at least you get that.

With that out of the way, let’s take that simple query and make a couple minor adjustments to get some other information out.

WITH qs_waits AS (
SELECT   qsws.wait_category_desc,
         qsws.plan_id,
         SUM(qsws.total_query_wait_time_ms) / 1000. AS total_query_wait_time_s,
         AVG(qsws.avg_query_wait_time_ms) / 1000. AS avg_query_wait_time_s,
         MAX(qsws.max_query_wait_time_ms) / 1000. AS max_query_wait_time_s
FROM     sys.query_store_wait_stats AS qsws
WHERE qsws.wait_category_desc = 'Lock'
GROUP BY qsws.wait_category_desc, qsws.plan_id
)
SELECT qsw.*,
       r.*,
       p.query_sql_text,
       TRY_CONVERT(XML, p.query_plan) AS query_plan
FROM qs_waits AS qsw
OUTER APPLY
(
    SELECT TOP (1) qsp.plan_id,
                   qsp.query_plan,
                   qsqt.query_sql_text
    FROM sys.query_store_plan AS qsp
    JOIN sys.query_store_query AS qsq
        ON qsq.query_id = qsp.query_id
    JOIN sys.query_store_query_text AS qsqt
        ON qsq.query_text_id = qsqt.query_text_id
    WHERE qsw.plan_id = qsp.plan_id
    ORDER BY qsp.last_execution_time DESC
) AS p
OUTER APPLY
(
    SELECT TOP (1) qsrs.avg_duration / 1000. AS avg_duration, 
                   qsrs.avg_cpu_time / 1000. AS avg_cpu_time
    FROM sys.query_store_runtime_stats AS qsrs
    WHERE qsrs.plan_id = p.plan_id
    ORDER BY qsrs.last_execution_time DESC
) AS r;

There we go.

Nice, clean, simple, but most important totally intuitive. I love how easy it is to quickly get the information you want.

😶

Lookin’ For Locks


That query will get information about queries which waited on Locks — that’s why we’re getting plan_id up in our CTE.

After that, we’re getting the plan and text of any queries that waited on locks, and a couple vanity metrics.

Growing Away

Only one query waited on locks. Make a couple notes here, though:

  • avg_query_wait_time is 21 seconds
  • avg_query_duration is 21 seconds
  • avg_cpu_time is 0.12 seconds

My workload is Highly Contrived™ so the avg wait and query duration line up. In real life, you probably won’t have queries that were only ever waiting on locks to be released.

But it’s worth making some comparisons like this when you’re having blocking problems, especially when you tie wait times and durations in with cpu time.

If cpu is very low but duration is high, generally, you’ve found blocked queries.

Collectiving

Looking at the text and plan, we can also reasonably surmise that this read query wasn’t blocking anything.

But Still…


This doesn’t tell us what query was blocking the select. If you want to figure that out, you’ve got some options:

Sure, you could also go digging through DMVs to find modification queries that last ran around the same time, but I’ve hit a lot of dead ends there.

A lot of this pain would go away if SQL Server were optimistic by default, or if you’re allowed to enable an optimistic isolation level.

And remember: Queries taking locks don’t register locking waits. Only queries being blocked register locking waits.

For a limited time, get my Locking and Blocking course for just 99 cents!

Thanks for reading!



3 thoughts on “Lock Waits Come From Blocking, Not Locks Being Taken

Leave a Reply

Your email address will not be published. Required fields are marked *