Mama, I Tried
Back when I first wrote sp_BlitzQueryStore, I was totally enamored with Query Store.
Like the plan cache, but better. History. Different plans for the same query. Not disturbed by memory pressure or restarts.
Then I waited patiently to find a client on 2016 using it.
And waited, and waited, and waited.
And finally, some came along.
Slow Pokes And No Pokes
When I ran it, it took forever. Not even the XML part. The XML part was fast.
Gathering the initial set of data was slow.
With some time to experiment and dig in, I found that the IN_MEM tables cause significant performance issues when:
- Query Store is actively logging data
- Query Store is > 25 MB or so
Yes, children, in memory tables can be slow, too.
Let’s take a couple simple queries against Query Store tables:
SELECT TOP 10 * FROM sys.query_store_runtime_stats AS qsrs WHERE qsrs.avg_cpu_time >= 500000 AND qsrs.last_execution_time >= DATEADD(DAY, -1, GETDATE()) ORDER BY qsrs.avg_cpu_time DESC; SELECT TOP 10 * FROM sys.query_store_plan AS qsp WHERE qsp.query_plan IS NOT NULL AND qsp.last_execution_time >= DATEADD(DAY, -1, GETDATE()) ORDER BY qsp.last_execution_time DESC;
The first query runs for 10 seconds, with the entirety of the time spent filtering data out of the IN_MEM table:
The second query is even worse, at nearly 2 minutes:
I know, this configuration is probably unsupported because I used SELECT * or something.
I wrote this query hoping to quickly get the worst plans by a specific metric.
WITH the_pits AS ( SELECT TOP ( 101 ) qsrs.plan_id, qsp.query_id, qsrs.avg_duration / 100000. AS avg_duration_s, qsrs.avg_cpu_time / 100000. AS avg_cpu_time_s, qsrs.avg_query_max_used_memory, qsrs.avg_logical_io_reads, qsrs.avg_logical_io_writes, qsrs.avg_tempdb_space_used, qsrs.last_execution_time, /* You can stick any of the above metrics in here to find offenders by different resource abuse */ MAX(qsrs.avg_cpu_time) OVER ( PARTITION BY qsp.query_id ORDER BY qsp.query_id ROWS UNBOUNDED PRECEDING ) AS n FROM sys.query_store_runtime_stats AS qsrs JOIN sys.query_store_plan AS qsp ON qsp.plan_id = qsrs.plan_id WHERE qsrs.avg_duration >= ( 5000. * 1000. ) AND qsrs.avg_cpu_time >= ( 1000. * 1000. ) AND qsrs.last_execution_time >= DATEADD(DAY, -7, GETDATE()) AND qsp.query_plan IS NOT NULL /* Don't forget to change this to same thing! */ ORDER BY qsrs.avg_cpu_time DESC ) SELECT p.plan_id, p.query_id, p.avg_duration_s, p.avg_cpu_time_s, p.avg_query_max_used_memory, p.avg_logical_io_reads, p.avg_logical_io_writes, p.avg_tempdb_space_used, p.last_execution_time, qsqt.query_sql_text, TRY_CONVERT(XML, qsp.query_plan) AS query_plan FROM sys.query_store_plan AS qsp JOIN the_pits AS p ON p.plan_id = qsp.plan_id JOIN sys.query_store_query AS qsq ON qsq.query_id = qsp.query_id JOIN sys.query_store_query_text AS qsqt ON qsq.query_text_id = qsqt.query_text_id ORDER BY p.n DESC;
It works pretty well. Sometimes.
Other times, it runs for 4.5 minutes.
I know what you’re thinking: “Erik, you’re doing all sorts of crazy stuff in there. You’re making it slow.”
But none of the crazy stuff I’m doing is where the slowdown is.
It’s all in the same stuff I pointed out in the simpler queries.
I can’t stress how much I want Query Store to be successful. I absolutely love the idea.
But it just wasn’t implemented very well. Simple filtering against the data takes forever.
And yes, you can have NULL query plans for some reason. That’s rich.
The irony of needing to tune queries so you can find queries to tune is ironic.
I’m nearly sure of it.
Thanks for reading!