When Does UDF Inlining Kick In?

The Eye


UPDATE: After writing this and finding the results fishy, I reported the behavior described below in “Somewhat Surprising” and “Reciprocal?” and it was confirmed a defect in SQL Server 2019 CU8, though I haven’t tested earlier CUs to see how far back it goes. If you’re experiencing this behavior, you’ll have to disable UDF inlining in another way, until CU releases resume in the New Year.

With SQL Server 2019, UDF inlining promises to, as best it can, inline all those awful scalar UDFs that have been haunting your database for ages and making queries perform terribly.

But on top of the long list of restrictions, there are a number of other things that might inhibit it from kicking in.

For example, there’s a database scoped configuration:

ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = ON/OFF; --Toggle this

SELECT 
    dsc.*
FROM sys.database_scoped_configurations AS dsc
WHERE dsc.name = N'TSQL_SCALAR_UDF_INLINING';

There’s a function characteristic you can use to turn them off:

CREATE OR ALTER FUNCTION dbo.whatever()
RETURNS something
WITH INLINE = ON/OFF --Toggle this
GO

And your function may or not even be eligible:

SELECT 
    OBJECT_NAME(sm.object_id) AS object_name,
    sm.is_inlineable
FROM sys.sql_modules AS sm
JOIN sys.all_objects AS ao
    ON sm.object_id = ao.object_id
WHERE ao.type = 'FN';

Somewhat Surprising


One thing that caught me off guard was that having the database in compatibility level 140, but running the query in compatibility level 150 also nixed the dickens out of it.

DBCC FREEPROCCACHE;
GO 

ALTER DATABASE StackOverflow2013 SET COMPATIBILITY_LEVEL = 140;
GO 

WITH Comments AS 
(
    SELECT
        dbo.serializer(1) AS udf, --a function
        ROW_NUMBER() 
            OVER(ORDER BY 
                     c.CreationDate) AS n
    FROM dbo.Comments AS c
)
SELECT 
    c.*
FROM Comments AS c
WHERE c.n BETWEEN 1 AND 100
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_150'), MAXDOP 8);
GO

Our query has all the hallmarks of one that has been inflicted with functions:

it can’t go parallel

And if you’re on SQL Server 2016+, you can see that it executes once per row:

SELECT 
    OBJECT_NAME(defs.object_id) AS object_name,
    defs.execution_count,
    defs.total_worker_time,
    defs.total_physical_reads,
    defs.total_logical_writes,
    defs.total_logical_reads,
    defs.total_elapsed_time
FROM sys.dm_exec_function_stats AS defs;
rockin’ around

Reciprocal?


There’s an odd contradiction here, though. If we repeat the experiment setting the database compatibility level to 150, but running the query in compatibility level 140, the function is inlined.

DBCC FREEPROCCACHE;
GO 

ALTER DATABASE StackOverflow2013 SET COMPATIBILITY_LEVEL = 150;
GO 

WITH Comments AS 
(
    SELECT
        dbo.serializer(c.Id) AS udf,
        ROW_NUMBER() 
            OVER(ORDER BY 
                     c.CreationDate) AS n
    FROM dbo.Comments AS c
)
SELECT 
    c.*
FROM Comments AS c
WHERE c.n BETWEEN 1 AND 100
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140'), MAXDOP 8);
GO

Rather than seeing a non-parallel plan, and non-parallel plan reason, we see a parallel plan, and an attribute telling us that a UDF has been inlined.

call hope

And if we re-check the dm_exec_function_stats DMV, it will have no entries. That seems more than a little bit weird to me, but hey.

I’m just a lowly consultant on SSMS 18.6

Thanks for reading!

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.

An Overlooked Benefit Of Batch Mode With Windowing Functions

Lavender


If you ask people who tune queries why batch mode is often much more efficient with windowing functions, they’ll tell you about the window aggregate operator.

That’s all well and good, but there’s another, often sneaky limitation of fully row mode execution plans with windowing functions.

Let’s go take a look!

Global Aggregates


One thing that causes an early serial zone in execution plans is if you use a windowing function that only has the order by

For example, let’s look at the plans for these two queries:

WITH Comments AS 
(
    SELECT
        ROW_NUMBER() 
            OVER(ORDER BY 
                     c.CreationDate) AS n
    FROM dbo.Comments AS c
)
SELECT *
FROM Comments AS c
WHERE c.n = 0;

WITH Comments AS 
(
    SELECT
        ROW_NUMBER() 
            OVER(PARTITION BY 
                     c.UserId
                 ORDER BY 
                     c.CreationDate) AS n
    FROM dbo.Comments AS c
)
SELECT *
FROM Comments AS c
WHERE c.n = 0;

The resulting estimated plans look like this, using the 140 compatibility level:

oops

In the top plan, where the windowing function only has an order by, the serial zone happens immediately before the Segment operator. In the second plan, the parallel zone carries on until right before the select operator.

If you’re wondering why we’re only looking at estimated plans here, it’s because repartition streams ruins everything.

In The Year 2000


In compatibility level 150, things change a bit (yes, a window aggregate appears):

merry christmas

And the window aggregate appears within the parallel zone. The parallel zone does end before the filter operator, which may or may not be a disaster depending on how restrictive your filter is, and how many rows end up at it.

Also note the distinct lack of a repartition streams operator ruining everything. We’ll talk about that tomorrow.

Thanks for reading!

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.

Batch Mode On Row Store vs Batch Mode Tricks

Quiet Time


I think Batch Mode is quite spiffy for the right kind of query, but up until SQL Server 2019, we had to play some tricks to get it:

  • Do a funny join to an empty table with a column store index
  • Create a filtered column store index with no data in it

If you’re on SQL server 2019 Enterprise Edition, and you’ve got your database in compatibility level 150, you may heuristically receive Batch Mode without those tricks.

One important difference between Batch Mode Tricks™ and Batch Mode On Rowstore (BMOR) is that the latter allows you to read from row mode tables using Batch Mode, while the former doesn’t.

Tricks have limits, apparently.

Squish Squish


To cut down on typing, I’ll often create a helper object like this:

CREATE TABLE dbo.t
(
    id int NULL,
    INDEX c CLUSTERED COLUMNSTORE
);

If you read this post, you’ll understand more why.

Now, let’s compare these two queries:

SELECT 
    p.OwnerUserId,
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p
LEFT JOIN dbo.t
    ON 1 = 0
WHERE p.Score < 50
GROUP BY p.OwnerUserId
HAVING COUNT_BIG(*) > 2147483647
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140'), MAXDOP 8);

SELECT 
    p.OwnerUserId,
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.Score < 50
GROUP BY p.OwnerUserId
HAVING COUNT_BIG(*) > 2147483647
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_150'), MAXDOP 8);

 

One executes in compatibility level 140, the other in 150.

Splish Splash


There are a couple interesting things, here.

the porter

Even though both queries have operators that execute in Batch Mode (Filter, Hash Match), only the second query can read from the row store clustered index in Batch Mode. In this case, that shaves a couple hundred milliseconds off the seek.

There is likely some additional invisible benefit to not having to convert the row mode seek to a batch mode hash join at the next operator, since one executes for 501ms, and the other executes for 278ms. There’s nothing in the query plan to signal that happening, so you’ll just have to use your imagination.

Thanks for reading!

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.

SQL Server 2019: When Batch Mode On Rowstore Isn’t Better

Um Hello


WHAT DO YOU MEAN YOU’RE NOT ON SQL SERVER 2019 YET.

Oh. Right.

That.

Regressed


Look, whenever you make changes to the optimizer, you’re gonna hit some regressions.

And it’s not just upgrading versions, either. You can have regressions from rebuilding or restarting or recompiling or a long list of things.

Databases are terribly fragile places. You have to be nuts to work with them.

I’m not mad at 2019 or Batch Mode On Rowstore (BMOR) or anything.

But if I’m gonna get into it, I’m gonna document issues I run into so that hopefully they help you out, too.

One thing I ran into recently was where BMOR kicked in for a query and made it slow down.

Repro


Here’s my index:

CREATE INDEX mailbag ON dbo.Posts(PostTypeId, OwnerUserId) WITH(DATA_COMPRESSION = ROW);

And here’s my query:

SELECT u.Id, u.DisplayName, u.Reputation,
       (SELECT COUNT_BIG(*) FROM dbo.Posts AS pq WHERE pq.OwnerUserId = u.Id AND pq.PostTypeId = 1) AS q_count,
       (SELECT COUNT_BIG(*) FROM dbo.Posts AS pa WHERE pa.OwnerUserId = u.Id AND pa.PostTypeId = 2) AS a_count
FROM dbo.Users AS u
WHERE u.Reputation >= 25000
ORDER BY u.Id;

It’s simplified a bit from what I ran into, but it does the job.

Batchy

This is the batch mode query plan. It runs for about 2.6 seconds.

who would complain?

Rowy

And here’s the row mode query plan. It runs for about 1.3 seconds.

oh that’s why.

What Happened?


Just when you think the future is always faster, life comes at you like this.

So why is the oldmode query more than 2x faster than the newhotmode query?

There are a reason, and it’s not very sexy.

Batch Like That

First, the hash joins produce Bitmaps.

bitted

You don’t see Bitmaps in Batch Mode plans as operators like you’re used to in Row Mode plans. You have to look at the properties (not the tool tip) of the Hash Join operator.

Even though both plans seek into the index on Posts, it’s only for the PostTypeId in the Batch Mode plan.

It would be boring to show you both, so I’m just going to use the details from the branch where we find PostTypeId = 2.

buck fifty

Remember this pattern: we seek to all the values where PostTypeId = 2, and then apply the Bitmap as a residual predicate.

You can pretty easily mentally picture that.

Rowbot

In the row mode plan, the Nested Loops Joins are transformed to Apply Nested Loops:

applys and oranges

Which means on the inner side of the join, both the PostTypeId and the OwnerUserId qualify as seek predicates:

oh yeah that

Reading Rainbow


The better performance comes from doing fewer reads when indexes are accessed.

psychic tv

Though both produce the same number of rows, the Hash Join plan in Batch Mode reads 28 million rows, or about 21 million more rows than the Nested Loop Join plan in row mode. In this case, the double seek does far fewer reads, and even Batch Mode can’t cover that up.

Part of the problem is that the optimizer isn’t psychic.

Fixing It


There are two ways I found to get the Nested Loop Join plan back.

The boring one, using a compat level hint:

SELECT u.Id, u.DisplayName, u.Reputation,
       (SELECT COUNT_BIG(*) FROM dbo.Posts AS pq WHERE pq.OwnerUserId = u.Id AND pq.PostTypeId = 1) AS q_count,
       (SELECT COUNT_BIG(*) FROM dbo.Posts AS pa WHERE pa.OwnerUserId = u.Id AND pa.PostTypeId = 2) AS a_count
FROM dbo.Users AS u
WHERE u.Reputation >= 25000
ORDER BY u.Id
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140'));

And the more fun one, rewriting the correlated subqueries as outer apply:

SELECT u.Id, u.DisplayName, u.Reputation, q_count, a_count
FROM dbo.Users AS u
    OUTER APPLY(SELECT COUNT_BIG(*) AS q_count FROM dbo.Posts AS pq WHERE pq.OwnerUserId = u.Id AND pq.PostTypeId = 1) AS q_count
    OUTER APPLY(SELECT COUNT_BIG(*) AS a_count FROM dbo.Posts AS pa WHERE pa.OwnerUserId = u.Id AND pa.PostTypeId = 2) AS a_count
WHERE u.Reputation >= 25000
ORDER BY u.Id;

Thanks for reading!

Experiment With Table Variable Deferred Compilation Without SQL Server 2019

I Have A Secret To Tell You


If you haven’t heard by now, Table Variables have some ✌features✌ that can cause performance issues pretty generally in SQL Server.

  • One row estimates unless you recompile (or that darn trace flag)
  • No column-level statistics (even with indexes)
  • Modifications to them can’t go parallel (without sorcery)

But in SQL Server 2019, Microsoft fixed one of those things, kind of, with Table Variable Deferred Compilation.

Rather than just give you that one row estimate, it’ll wait until you’ve loaded data in, and then it will use table cardinality for things like joins to the table variable. Just be careful when you use them in stored procedures.

That can be a lot more helpful than what you currently get, but the guesses aren’t quite as helpful when you start using a where clause, because there still aren’t column-level statistics. You get the unknown guess for those.

How Can You Test It Out Before SQL Server 2019?


You can use #temp tables.

That’s right, regular old #temp tables.

They’ll give you nearly the same results as Table Variable Deferred Compilation in most cases, and you don’t need trace flags, hints, or or SQL Server 2019.

Heck, you might even fall in love with’em and live happily ever after.

The Fine Print


I know, some of you are out there getting all antsy-in-the-pantsy about all the SQL Jeopardy differences between temp tables and table variables.

I also realize that this may seem overly snarky, but hear me out:

Sure, there are some valid reasons to use table variables at times. But to most people, the choice about which one to use is either a coin flip or a copy/paste of what they saw someone else do in other parts of the code.

In other words, there’s not a lot of thought, and probably no adequate testing behind the choice. Sort of like me with tattoos.

Engine enhancements like this that benefit people who can’t change the code (or who refuse to change the code) are pretty good indicators of just how irresponsible developers have been with certain ✌features✌. I say this because I see it week after week after week. The numbers in Azure had to have been profound enough to get this worked on, finally.

I can’t imagine how many Microsoft support tickets have been RCA’d as someone jamming many-many rows in a table variable, with the only reasonable solution being to use a temp table instead.

I wish I could say that people learned the differences a lot faster when they experienced some pain, but time keeps proving that’s not the case. And really, it’s hardly ever the software developers who feel it with these choices: it’s the end users.

Thanks for reading!

If you like learning about performance tuning topics like this, check out my video training site.

Query Tuning SQL Server 2019 Part 5: I’m Not Going Back

Butt Out Bag


There was one thing that I didn’t talk about earlier in the week.

You see, there’s a mystery plan.

It only shows up once in a while, like Planet X. And when it does, we get bombarded by asteroids.

Just like when Planet X shows up.

I wouldn’t call it a good all-around plan, but it does something that we would want to happen when we run this proc for VoteTypeId 5.

Let’s go look!

The Optimizer Discovers Aggregates, Sort Of


This isn’t a good “general” plan. In fact, for any of the previously fast values, it sucks.

It sucks because just like the “optimize for unknown” plan, it has a bunch of startup costs, does a lot of scanning, and is generally a bad choice for VoteTypeIds that produce a small number of values.

Ghost Town

Johnny Four


If you look carefully, you can see what the problem is.

For VoteTypeIds that filter out a lot of rows (which is most of them), that predicate doesn’t get applied until after Posts and Badges have been joined.

In other words, you fully join those tables, and then the result of that join is joined to the predicate-filtered result of Votes.

For this execution, the plan was compiled initially for VoteTypeId 2. It has 130 million entries in Votes. It’s the only VoteTypeId that produces this plan naturally.

The plan you’re looking at above was re-executed with VoteTypeId 4, which has… 8,190 rows in Votes.

I can’t stress enough how difficult it would be to figure out why this is bad just looking at estimated plans.

Though one clue would be the clustered index scan + predicate, if we knew that we had a suitable index.

2legit

This kind of detail with row discrepancies only surfaces with actual plans.

But there is one thing here that wasn’t showing up in other plans, when we wanted it to: The optimizer decides to aggregate OwnerUserId coming from the Posts table prior to joining to Votes.

Johnny Five


If you recall the previously used plan, one complaint was that the result of joining Posts and Badges then joined to Votes had to probe 932 million rows.

You can sort of see that here, where the Adaptive Join prior to the highlighted Hash Match Aggregate produces >100 million rows. It’s more here because we don’t have Bitmaps against both Posts and Badges, but… We’re going off track a bit with that.

That could have been avoided if the optimizer had decided to aggregate OwnerUserId, like it does in this plan.

To compare:

gag order

The top plan has a handy green square to show you a helpful pre-join aggregation.

The bottom plan has no handy green squares because there is no helpful pre-join aggregation.

The product of the aggregation is 3.2 million rows, which is exactly what we got as a distinct count when we began experimenting with temp tables:

SELECT COUNT_BIG(DISTINCT p.OwnerUserId) AS records --3,236,013 
FROM dbo.Posts AS p 
JOIN dbo.Badges AS b 
    ON b.UserId = p.OwnerUserId 
WHERE p.PostTypeId = 1;

Outhouse


If the optimizer had chosen to aggregate OwnerUserId prior to the join to Votes, we all could have gone home early on Friday and enjoyed the weekend

Funny, that.

Speaking of which, it’s Friday. Go enjoy the weekend.

Thanks for reading!

This week I’m having a sale on my SQL Server 2019 course, normally $99.95.

If you want to see the entire thing, it’s available this week for just $19.99.

All you have to do is add it to your cart, and the discount will be applied at checkout.

If you like what you see here, sign up for my email list to get 50% off your next purchase.

Query Tuning SQL Server 2019 Part 4: Long Live The Query Tuner

Rumors Of My Demise


Let’s talk about some common hints that people use to fix parameter sniffing:

  • RECOMPILE: Won’t work here to get us a better plan for VoteTypeId 5, because it sucks when the optimizer knows what’s coming
  • OPTIMIZE FOR UNKNOWN: Works like once every 5 years, but people still bring it up, and really sucks here (picture below)
  • OPTIMIZE FOR (VALUE): Plan sharing doesn’t work great generally — if we were gonna do this, it’d have to be dynamic

This is what happens when we optimize for unknown. The density vector guess is 13,049,400.

Stop it with this.

That guess for Vote Types with very few rows ends up with a plan that has very high startup costs.

This version of the query will run for 13-17 seconds for any given parameter. That sucks in zero gravity.

Pictured above is the plan for VoteTypeId 4, which previously finished sub-second using Plan 1 and Plan 2.

With those out of the way, how can we fix this thing?

The Mint


In some circumstances, a #temp table would help if we pre-staged rows from Votes.

The problem is that for many calls, we’d be putting between 7 and 130 MILLION rows into a temp table.

Not my idea of a good time.

RAMDISKS NINETY NINE CENTS

But what about the other part of the query?

If count up distinct OwnerUserIds, we get about 3.2 million.

Better, we can reduce the rows further in the procedure with an EXISTS to Votes (I’ll show you that in a minute).

SELECT COUNT_BIG(DISTINCT p.OwnerUserId) AS records --3,236,013
FROM dbo.Posts AS p
JOIN dbo.Badges AS b 
    ON b.UserId = p.OwnerUserId 
WHERE  p.PostTypeId = 1 

That’s not too bad, depending on:

  • How frequently it runs
  • How concurrently it runs
  • How overwhelmed tempdb is
  • Your Mom

The Product


That gives us:

CREATE OR ALTER PROCEDURE dbo.VoteSniffing ( @VoteTypeId INT )
AS
BEGIN
SET XACT_ABORT, NOCOUNT ON;

SELECT DISTINCT p.OwnerUserId
INTO #p
FROM dbo.Posts AS p
JOIN dbo.Badges AS b
    ON b.UserId = p.OwnerUserId
WHERE p.PostTypeId = 1
AND EXISTS
(
    SELECT 1/0
    FROM dbo.Votes AS v
    WHERE v.UserId = p.OwnerUserId
    AND   v.VoteTypeId = @VoteTypeId
);

SELECT   ISNULL(v.UserId, 0) AS UserId,
         SUM(CASE WHEN v.CreationDate >= '20190101'
                  AND  v.CreationDate < '20200101'
                  THEN 1
                  ELSE 0
             END) AS Votes2019,
         SUM(CASE WHEN v.BountyAmount IS NULL
                  THEN 0
                  ELSE 1
             END) AS TotalBounty,
         COUNT(DISTINCT v.PostId) AS PostCount,
         @VoteTypeId AS VoteTypeId
FROM     dbo.Votes AS v WITH(FORCESEEK)
WHERE    v.VoteTypeId = @VoteTypeId
AND      NOT EXISTS
        (   
            SELECT 1/0
            FROM #p AS p
            WHERE  p.OwnerUserId = v.UserId
        )
GROUP BY v.UserId;

END;
GO

Which works pretty well across all calls, and avoids the primary issue with VoteTypeId 5.

Navy Blue

I’m generally happy with this, with the slight exception of VoteTypeId 8. Yeah, it beats the pants off of when we sniff Plan 2, but it’s about 7 seconds slower than when we get Plan 1.

I pulled the 17 minute execution from this graph for Plan 2/VoteTypeId 5, too, because it’s so distracting. Not having to worry about that thing is a trade off I’m willing to make for Plan 3 being about a second slower than Plan 1.

Not bad for a lazy Sunday afternoon of blogging, though.

Save One For Friday


Query tuning in SQL Server 2019 isn’t always a whole lot different from performance tuning other versions of SQL Server.

You have some more help from optimizer features (especially if you’re on Enterprise Edition), but they don’t solve every problem, and you can run into some very common problems that you’re already used to solving.

You may even be able to use some very familiar techniques to fix things.

In tomorrow’s post, I want to look at a quirk that would have thrown us way off course to explore on our way here.

Thanks for reading!

This week I’m having a sale on my SQL Server 2019 course, normally $99.95.

If you want to see the entire thing, it’s available this week for just $19.99.

All you have to do is add it to your cart, and the discount will be applied at checkout.

If you like what you see here, sign up for my email list to get 50% off your next purchase.

Query Tuning SQL Server 2019 Part 3: Who Died And Made You The Optimizer?

Be Yourself


We’ve got a problem, Sam Houston. We’ve got a problem with a query that has some strange issues.

It’s not parameter sniffing, but it sure could feel like it.

  • When the procedure compiles and runs with VoteTypeId 5, it runs for 12 minutes
  • Other VoteTypeIds run well with the same plan that VoteTypeId 5 gets
  • When VoteTypeId 5 runs with a “small” plan, it does okay at 10 seconds

Allow me to ruin a graph to illustrate. The Y axis is still seconds, but… it goes up a little higher now.

weigh-in

The Frustration (A Minor Digression)


Here’s where life can be tough when it comes to troubleshooting actual parameter sniffing.

If you’re relying solely on the plan cache, you’re screwed. You’ll see the plan, and the compile value, but you won’t have the runtime value anywhere that “caused” the problem. In other words, the set of parameters that were adversely affected by the query plan that didn’t fit.

There are some things that can help, like if you’re watching it happen live, or if you have a monitoring tool that might capture runtime parameters.

OR IF YOU USE SP UNDERSCORE HUMANEVENTS.

Like I said, this isn’t parameter sniffing, but it feels like it.

It could extra-feel like it because you might see a misbehaving query, and a compile-time parameter that runs quickly on its own when you test it, e.g. VoteTypeId 6.

It would be really hard to tell that even if a plan were to compile specifically for a different parameter, it would still run for 12 minutes.

Heck, that’d even catch me off-guard.

But that’s what we have here: VoteTypeId 5 gets a bad plan special for VoteTypeId 5.

Examiner


Let’s dig in on what’s happening to cause us such remarkable grief. There has to be a reason.

I don’t need more grief without reason; I’ve already got a public school education.

I WANT TO KNOW

If we were to summarize the problem here: that Hash Match Left Anti Semi Join runs for 12 minutes on its own.

No other operator, or group of operators, is responsible for a significant amount of time comparatively.

Magnifier


Some things to note:

  • The bad estimates aren’t from predicates, they’re from Batch Mode Bitmaps
  • Those bad estimates end up producing a much larger number of rows from the Adaptive Join
  • The Hash Match ends up needing to probe 932 million rows

 

el disastero

Taking 12 minutes to probe 932 million rows is probably to be expected, now that I think about it.

If the optimizer had a good estimate from the Bitmaps here, it may have done the opposite of what a certain Pacific Island Dwelling Bird said:

Getting every nuance of this sort of relational transformation correct can be tricky. It is very handy that the optimizer team put the effort in so we do not have to explore these tricky rewrites manually (e.g. by changing the query text). If nothing else, it would be extremely tedious to write all the different query forms out by hand just to see which one performed better in practice. Never mind choosing a different version depending on current statistics and the number of changes to the table.

In this case, the Aggregate happens after the join. If the estimate were correct, or even in the right spacetime dimension, this would be fine.

We can gauge the general efficiency of it by looking at when this plan is used for other parameters that produce numbers of rows that are closer to this estimate.

huey

If the optimizer had made a good guess for this parameter, it may have changed the plan to put an aggregate before the join.

Unfortunately we have very little control over estimates for Bitmaps, and the guesses for Batch Mode Bitmaps are a Shrug of Atlassian proportions.

Finisher


We’ve learned some things:

  1. This isn’t parameter sniffing
  2. Batch Mode Bitmaps wear pants on their head
  3. Batch Mode Bitmaps set their head-pants on fire
  4. Most of the time Batch Mode performance covers this up
  5. The plan for VoteTypeId 5 is not a good plan for VoteTypeId 5
  6. The plan for VoteTypeId 5 is great for a lot of other VoteTypeIds

In tomorrow’s post, we’ll look at how we can fix the problem.

Thanks for reading!

This week I’m having a sale on my SQL Server 2019 course, normally $99.95.

If you want to see the entire thing, it’s available this week for just $19.99.

All you have to do is add it to your cart, and the discount will be applied at checkout.

If you like what you see here, sign up for my email list to get 50% off your next purchase.

Query Tuning SQL Server 2019 Part 2: Big Databases, Big Ideas

Are We Still Friends?


When I first wrote this demo, I called it dbo.ParameterSniffingMonstrosity.

Because , you know, it’s really terrible.

CREATE OR ALTER PROCEDURE dbo.VoteSniffing( @VoteTypeId INT )
AS
SET XACT_ABORT, NOCOUNT ON;
    BEGIN
        SELECT   ISNULL(v.UserId, 0) AS UserId,
                 SUM(CASE WHEN v.CreationDate >= '20190101'
                          AND  v.CreationDate < '20200101'
                          THEN 1
                          ELSE 0
                     END) AS Votes2019,
                 SUM(CASE WHEN v.BountyAmount IS NULL
                          THEN 0
                          ELSE 1
                     END) AS TotalBounty,
                 COUNT(DISTINCT v.PostId) AS PostCount,
                 @VoteTypeId AS VoteTypeId
        FROM     dbo.Votes AS v
        WHERE    v.VoteTypeId = @VoteTypeId
        AND      NOT EXISTS
                (   
                    SELECT 1/0
                    FROM dbo.Posts AS p
                    JOIN dbo.Badges AS b 
                        ON b.UserId = p.OwnerUserId 
                    WHERE  p.OwnerUserId = v.UserId
                    AND    p.PostTypeId = 1 
                )
        GROUP BY v.UserId;
    END;
GO

The only parameter is for VoteTypeId, which has some pretty significant skew towards some types, especially in the full size Stack Overflow database.

Ask me about my commas

It’s like, when people tell you to index the most selective column first, well.

  • Sometimes it’s pretty selective.
  • Sometimes it’s not very selective

But this is exactly the type of data that causes parameter sniffing issues.

With almost any data set like this, you can draw a line or three, and values within each block can share a common plan pretty safely.

But crossing those lines, you run into issues where either little plans do far too much looping and seeking and sorting for “big” values, and big plans do far too much hashing and scanning and aggregating for “little” values.

This isn’t always the exact case, but generally speaking you’ll observe something along these lines.

It’s definitely not the case for what we’re going to be looking at this week.

This week is far more interesting.

That’s why it’s a monstrosity.

Fertilizer


The indexes that I create to support this procedure look like so — I’ve started using compression since at this point in time, 2016 SP1 is commonplace enough that even people on Standard Edition can use them — and they work quite well for the majority of values and query plans.

CREATE INDEX igno
ON dbo.Posts 
    (OwnerUserId, PostTypeId)
    WHERE PostTypeId = 1 
WITH(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = ROW);
GO

CREATE INDEX rant
ON dbo.Votes 
    (VoteTypeId, UserId, PostId)
INCLUDE 
    (BountyAmount, CreationDate) 
WITH(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = ROW);
GO 

CREATE INDEX clown ON dbo.Badges( UserId ) 
WITH(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = ROW);
GO

If there are other indexes you’d like to test, you can do that locally.

What I want to point out is that for many values of VoteTypeId, the optimizer comes up with very good, very fast plans.

Good job, optimizer.

In fact, for any of these runs, you’ll get a good enough plan for any of the other values. They share well.

EXEC dbo.VoteSniffing @VoteTypeId = 4;
EXEC dbo.VoteSniffing @VoteTypeId = 6;
EXEC dbo.VoteSniffing @VoteTypeId = 7;
EXEC dbo.VoteSniffing @VoteTypeId = 9;
EXEC dbo.VoteSniffing @VoteTypeId = 11;
EXEC dbo.VoteSniffing @VoteTypeId = 12;
EXEC dbo.VoteSniffing @VoteTypeId = 13;
EXEC dbo.VoteSniffing @VoteTypeId = 14;
EXEC dbo.VoteSniffing @VoteTypeId = 15;
EXEC dbo.VoteSniffing @VoteTypeId = 16;

VoteTypeIds 1, 2, 3, 5, 8, and 10 have some quirks, but even they mostly do okay using one of these plans.

There are two plans you may see occur for these.

Plan 1

teeny tiny

Plan 2

it has adapted

Particulars & Peculiars


Plan 1 is first generated when the proc is compiled with VoteTypeId 4, and Plan 2 is first generated when the proc is compiled with VoteTypeId 6.

There’s a third plan that only gets generated when VoteTypeId 2 is compiled first, but we’ll have to save that for another post, because it’s totally different.

Here’s how each of those plans works across other possible parameters.

this is my first graph

Plan 1 is grey, Plan 2 is blue. It’s pretty easy to see where each one is successful, and then not so much. Anything < 100ms got a 0.

The Y axis is runtime in seconds. A couple are quite bad. Most are decent to okay.

Plans for Type 2 & 8 obviously stick out, but for different plans.

This is one of those things I need to warn people about when they get wrapped up in:

  • Forcing a plan (e.g. via Query Store or a plan guide)
  • Optimizing for unknown
  • Optimizing for a specific value
  • Recompiling every time (that backfires in a couple cases here that I’m not covering right now)

One thing I need to point out is that Plan 2 doesn’t have an entry here for VoteTypeId 5. Why?

Because when it inherits the plan for VoteTypeId 6, it runs for 17 minutes.

singalong

This is probably where you’re wondering “okay, so what plan does 5 get when it runs on its own? Is this the mysterious Plan 4 From Outer Space?”

Unfortunately, the plan that gets generated for VoteTypeId 5 is… the same one that gets generated for VoteTypeId 6, but 6 has a much smaller memory grant.

If you’re not used to reading operator times in execution plans, check out my video here.

Since this plan is all Batch Mode operators, each operator will track its time individually.

The Non-Switch


VoteTypeId 5 runtime, VoteTypeId 6 compile time

If I were to put a 17 minute runtime in the graph (>1000 seconds), it would defeat the purpose of graphing things.

Note the Hash Match has, by itself, 16 minutes and 44 seconds of runtime.

pyramids

VoteTypeId 5 runtime, and compile time

This isn’t awesome, either.

The Hash Join, without spilling, has 12 minutes and 16 seconds of runtime.

lost

Big Differentsiz


You have the same plan shape and operators. Even the Adaptive Join follows the same path to hash instead of loop.

Sure, the spills account for ~4 minutes of extra time. They are fairly big spills.

But the plan for VoteTypeId 5, even when compiled specifically for VoteTypeId 5… sucks, and sucks royally.

There are some dismally bad estimates, but where do they come from?

We just created these indexes, and data isn’t magically changing on my laptop.

TUNE IN TOMORROW!

Thanks for reading

This week I’m having a sale on my SQL Server 2019 course, normally $99.95.

If you want to see the entire thing, it’s available this week for just $19.99.

All you have to do is add it to your cart, and the discount will be applied at checkout.

If you like what you see here, sign up for my email list to get 50% off your next purchase.