I was helping a client troubleshoot a scalability issue recently, where during one point in their workload tempdb went out of control.
What was puzzling them was that nothing in the stored procedure did anything with temporary objects, cursors, or any of the other obvious things that go to el baño público. And the query plan itself was a pretty simple little loopy thing.
It wasn’t until I grabbed a quick sample of actual execution plans with Extended Events that the problem became more obvious.
Despite the fact that the query was only passing around 7 rows, we were getting a one row estimate, and there was a spilling Sort operator.
Chance Of Cloudballs
There were a couple compounding issues here, of course, like that their database is hosted in the cloud.
It doesn’t matter which cloud you talk about, this is a problem they all have and I see people face constantly.
I/O sucks, up there.
It was a lot harder to get this to repro on my local hardware, because I have very nice drives.
Let’s take a look!
If I run this query, it’ll spill a little bit. I have the MAXDOP 1 hint on there to artificially slow it down. Without that, the fast query and I/O make the contention harder to hit.
SELECT TOP (1000) u.Id FROM dbo.Users AS u ORDER BY u.Reputation OPTION(MAXDOP 1);
I stuck it in a stored procedure to make calling it from ostress a little bit easier.
After running for a bit, eventually we hit GAM contention:
We can see all the classic signs of tempdb contention, arising from spills.
It’s worth noting that enabling in-memory tempdb seems to alleviate this issue for the most part. I’d show you a picture of that, but it’s not worth 1000 words.
Fixing The Spill
Normally, I wouldn’t bother fixing a spill of that size, but in this case it was worthwhile because it was harming concurrency.
The quick fix was to add a MIN_GRANT_PERCENT hint. The long-term fix is to add an index to support the Sort operation.
Good ol’ indexes, putting data in order.
There are many reasons why a spill may occur. Poor estimates are quite common, but the reason for that poor estimate has about a billion potential root causes.
Even if your query gets enough memory, sometimes you may still see spills because certain operators had to share memory.
It’s a wild world.
Thanks for reading!