How Stats Get Updated Automatically

Spawning Monsters


Here we go again, with me promising to blog about something later.

This time it’s an attempt to explain how SQL Server chooses which statistics to update.

It’s not glamorous, and it may even make you angry, but you know.

They can’t all be posts about…

*checks notes*

*stares into the camera*

*tears up notes*

*tears up*

*stares off camera until someone cuts to commercials*

And We’re Back


Let’s start with the query we’re going to use to examine our statistics.

    SELECT      t.name, 
	            s.name, 
				s.stats_id,
				sp.last_updated, 
				sp.rows, 
				sp.rows_sampled, 
				sp.modification_counter
    FROM        sys.stats AS s
    JOIN        sys.tables AS t
        ON s.object_id = t.object_id
    CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
    WHERE       t.name = 'UserStats';

Right now, the results aren’t too interesting, because we only have a statistics object attached to the Primary Key.

We’re not gonna touch that column. We’re gonna use another column.

This query will get system generated statistics created on the AccountId column.

    SELECT COUNT(*)
    FROM   dbo.UserStats AS u
    WHERE  u.AccountId > 1000 
    AND    u.AccountId < 9999
	OPTION(RECOMPILE);
How nice of you to ask.

By itself, this isn’t very interesting. Let’s create an index, too.

    CREATE INDEX ix_AccountId ON dbo.UserStats ( AccountId );
Take Me Out Tonight

The index created statistics, too. With the equivalent of a full scan! See that rows_sampled column?

I mean, why not, if you’re already scanning the whole table to get the data you need for the index, right?

Right.

I’m gonna use a couple updates to flip values around.

	UPDATE u
	SET u.AccountId = u.UpVotes + u.DownVotes
	FROM dbo.UserStats AS u
	WHERE 1 = 1;
	
	UPDATE u
	SET u.AccountId = u.UpVotes - u.DownVotes
	FROM dbo.UserStats AS u
	WHERE 1 = 1;

Don’t ask me why I swallowed a fly.

But the WHERE 1 = 1 is enough to get SQL Prompt to not warn me about running an update with no where clause.

Modifideded.

Both stats objects have been modified the same number of times.

Let’s run our COUNT query and see what happens!

Oh, dammit.

We can see that only the stats for the index were updated (and with the default sampling rate, not a full scan).

Now let’s create another stats object with FULLSCAN.

    CREATE STATISTICS s_AccountId ON dbo.UserStats ( AccountId ) WITH FULLSCAN;

We’ll also go ahead and run an update again.

B-b-b-b-back

And then our COUNT query…

Ayeeeeeeee

SQL Server took two perfectly good fully sampled statistics and reduced them to the default sampling.

This doesn’t hurt our query, but it certainly is annoying to see.

That’s why newer versions of SQL Server allow you to persist the sampling rate.

Latest and Greatest


A lot of the stuff people call “rocket science” about statistics options, like auto create and auto update stats, are there for a reason.

When you let SQL Server make choices, they’re not always the best ones.

Tracking this stuff down and understanding when and if it’s a problem is hard work, though. Don’t flip those switches lightly, my friends.

Thanks for reading!



Leave a Reply

Your email address will not be published. Required fields are marked *