Here we go again, with me promising to blog about something later.
This time it’s an attempt to explain how SQL Server chooses which statistics to update.
It’s not glamorous, and it may even make you angry, but you know.
They can’t all be posts about…
*stares into the camera*
*tears up notes*
*stares off camera until someone cuts to commercials*
And We’re Back
Let’s start with the query we’re going to use to examine our statistics.
SELECT t.name, s.name, s.stats_id, sp.last_updated, sp.rows, sp.rows_sampled, sp.modification_counter FROM sys.stats AS s JOIN sys.tables AS t ON s.object_id = t.object_id CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp WHERE t.name = 'UserStats';
Right now, the results aren’t too interesting, because we only have a statistics object attached to the Primary Key.
We’re not gonna touch that column. We’re gonna use another column.
This query will get system generated statistics created on the AccountId column.
SELECT COUNT(*) FROM dbo.UserStats AS u WHERE u.AccountId > 1000 AND u.AccountId < 9999 OPTION(RECOMPILE);
By itself, this isn’t very interesting. Let’s create an index, too.
CREATE INDEX ix_AccountId ON dbo.UserStats ( AccountId );
The index created statistics, too. With the equivalent of a full scan! See that rows_sampled column?
I mean, why not, if you’re already scanning the whole table to get the data you need for the index, right?
I’m gonna use a couple updates to flip values around.
UPDATE u SET u.AccountId = u.UpVotes + u.DownVotes FROM dbo.UserStats AS u WHERE 1 = 1; UPDATE u SET u.AccountId = u.UpVotes - u.DownVotes FROM dbo.UserStats AS u WHERE 1 = 1;
Don’t ask me why I swallowed a fly.
But the WHERE 1 = 1 is enough to get SQL Prompt to not warn me about running an update with no where clause.
Both stats objects have been modified the same number of times.
Let’s run our COUNT query and see what happens!
We can see that only the stats for the index were updated (and with the default sampling rate, not a full scan).
Now let’s create another stats object with FULLSCAN.
CREATE STATISTICS s_AccountId ON dbo.UserStats ( AccountId ) WITH FULLSCAN;
We’ll also go ahead and run an update again.
And then our COUNT query…
SQL Server took two perfectly good fully sampled statistics and reduced them to the default sampling.
This doesn’t hurt our query, but it certainly is annoying to see.
That’s why newer versions of SQL Server allow you to persist the sampling rate.
Latest and Greatest
A lot of the stuff people call “rocket science” about statistics options, like auto create and auto update stats, are there for a reason.
When you let SQL Server make choices, they’re not always the best ones.
Tracking this stuff down and understanding when and if it’s a problem is hard work, though. Don’t flip those switches lightly, my friends.
Thanks for reading!