Query Tuning – Page 6

How OPTIMIZE FOR UNKNOWN Makes Troubleshooting SQL Server Performance Problems Harder

Posted on October 8, 2021May 16, 2022 by Erik Darling

Detained

Despite the many metric tons of blog posts warning people about this stuff, I still see many local variables and optimize for unknown hints. As a solution to parameter sniffing, it’s probably the best choice 1/1000th of the time. I still end up having to fix the other 999/1000 times, though.

In this post, I want to show you how using either optimize for unknown or local variables makes my job — and the job of anyone trying to fix this stuff — harder than it should be.

Passenger

Like most things, we’re going to start with an index:

CREATE INDEX r ON dbo.Users(Reputation);
GO

I’m going to have a stored procedure that uses three different ways to pass a value to a where clause:

CREATE OR ALTER PROCEDURE 
    dbo.u 
(
    @r int, 
    @u int
)
AS
BEGIN

    /* Regular parameter */
    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    WHERE u.Reputation = @r
    AND   u.UpVotes = @u;

    /* Someone who saw someone else do it at their last job */
    DECLARE 
        @LookMom int = @r,
        @IDidItAgain int = @u;
    
    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    WHERE u.Reputation = @LookMom
    AND   u.UpVotes = @IDidItAgain;

    /* Someone who read the blog post URL wrong */
    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    WHERE u.Reputation = @r
    AND   u.UpVotes = @u
    OPTION(OPTIMIZE FOR UNKNOWN);

END;
GO

First Way

The best case is we run this for a small number of rows, and no one really notices. Even though we get bad guesses for the second two queries, it’s not a huge deal.

When you run procs like this, SQL Server doesn’t cache the compile time values the same way it does when you use parameters. Granted, this is because it technically shouldn’t matter, but if you’re looking for a way to execute the proc again to reproduce the issue, it’s up to you to go figure out what someone did.

Since I’m getting the actual plans here, I get the runtime values for both, but those don’t show up in the plan cache or query store version of plans.

That’s typically a huge blind spot when you’re trying to fix performance issues of any kind, but it’s up to you to capture that stuff.

Just, you know, good luck doing it in a way that doesn’t squash performance.

Second Way

In this example, our index is only on the Reputation column, but our where clause is also on the UpVotes column.

In nearly every situations, it’s better to have your query do all the filtering it can from one index source — there are obviously exceptions — but the point here is that the optimizer doesn’t bother with a missing index request for the second two queries, only for the first one.

That doesn’t matter a ton if you’re looking at the query and plan right in front of you, but if you’re also using the missing index DMVs to get some idea about how useful overall a new index might be, you’re out of luck.

In this case, the optimizer doesn’t think the second two plans are costly enough to warrant anything, but it does for the first plan.

I’m not saying that queries with local variables or optimize for unknown hints always do this, or that parameterized plans will always ask for (good) indexes. There are many issues with costing and SARGability that can prevent them from showing up, including getting a trivial plan.

This is just a good example of how Doing Goofy Things™ can backfire on you.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

My SQL Server Query Ran For A Long Time But Didn’t Use A Lot Of CPU: What Happened?

Posted on October 6, 2021May 16, 2022 by Erik Darling

Of Walls And Clocks

No one ever says a broken record is right twice a day, perhaps because DJs are far more replaceable than clock makers.

I say that only to acknowledge that I may sound like a broken record when I say that when you’re tuning a query, it’s quite important to compare wall clock time and duration. Things you should note:

If CPU and duration were about equal in a serial plan, that’s normal
If CPU is much higher than duration in a parallel plan, that’s normal
If duration and CPU are about equal in a parallel plan, you’ve got yourself a situation
If duration is much higher than CPU in any plan, something else held your query up

In this post, I’m going to outline a non-exhaustive list of reasons why that last bullet point just might be.

Hammer Time

Big Data: One common reason you may run into is that you’re returning a large result set, either locally to SSMS, or to an app server that is either overloaded or underpowered. It’s also possible that something is happening on the application side that’s slowing things down. In these cases, you’ll usually see a lot of ASYNC_NETWORK_IO waits. To better test the speed of the actual query, you can dump the results into a #temp table.

Blocking: Another quite common issue is that the query you’re running is getting blocked. Before you go reaching for that NOLOCK hint, make sure you know what it does. Blocking is each to check on with sp_WhoIsActive. If you see your query waiting on waits that start with LCK_ Some common ones are LCK_M_SCH_S, LCK_M_SCH_M, LCK_M_S, LCK_M_U, LCK_M_X, LCK_M_IS, LCK_M_IU, LCK_M_IX. While your query is being blocked, it’s just gonna rack up wall clock time, while using zero CPU.

Stats updates: Once in a while I’ll run into a query that runs slowly the first time and fast the second time because behind the scenes the query had to wait on stats to update. It’s a bit hard to figure out unless you’re on SQL Server 2019, but it can totally make your query look like you sat around doing nothing for a chunk of time, especially if you’re waiting on large tables, or a bunch of smaller updates.

Reading from disk: If the tables or indexes you’re reading are bigger than your buffer pool, your queries are gonna eat it going to disk to read data in. If this is your limitation, you’ll see a lot of PAGIOLATCH_SH or PAGEIOLATCH_EX waits, depending on if your query is reading data, writing data, or both.

Waiting for a worker thread: When your queries can’t get a worker thread to run on, they end up waiting on THREADPOOL. While some waits on it are to be expected, you don’t want queries to wait long periods of time on this. It’s a serious sign your server is jammed way up.

Waiting for memory: When your queries can’t get the memory grant they want, they sit around waiting on RESOURCE_SEMAPHORE. Just like above, it’s a sure sign your server is having problems if it’s a wait you see occurring a lot, or for long average periods of time.

Waiting for a query plan: I know, this sounds like a long compilation time — and it sort of is — but only because the query is waiting a long time to get memory to compile an execution plan. This wait is going to be RESOURCE_SEMAPHORE_QUERY_COMPILE, instead.

Query Plan Compilation: Sometimes the optimizer gets a bit carried away and spend a whole bunch of time in search2 trying to reorder joins, and do other tricks to make your query faster. Under extreme circumstances, you might wait a really long time for that query plan. There’s no wait stats to tell you that, but if you look at your query plan’s properties (F4 key on the root operator), you can see the compile time.

Thanks for reading!

Going Further

UDF Inlining And String Aggregations In SQL Server 2019

Posted on October 4, 2021May 16, 2022 by Erik Darling

Quatro Enemigos

This post is really four different posts. Maybe five. The main points are:

STRING_AGG can’t be inlined if a UDF
Getting DISTINCT values into STRING_AGG is more obtuse than it should be
The documentation for UDF inlining is misleading
XML PATH can be inlined in a UDF

Let’s start at the top, because tops are top.

Strung Out

If you’re like me and you got excited by the induction of STRING_AGG into the T-SQL Lexicon because of all the code odd-balling it would replace, you were likely also promptly disappointed for a few reasons.

Needing to convert the element to be AGGed to a MAX to avoid errors
- STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation
The WITHIN GROUP ordering is clunky compared to an outer ORDER BY (but hey, logical query processing…)
No support for DISTINCT in the function, and an outer DISTINCT tries to make the wrong thing DISTINCT (see above)

And of course, it’s a breaking limitation for UDF inlining.

The UDF does not reference the STRING_AGG function

Let’s look at all of that stuff at once, in one function.

CREATE OR ALTER FUNCTION 
    dbo.IsStringAgg (@UserId int)
RETURNS 
    nvarchar(max)
WITH 
    RETURNS NULL ON NULL INPUT, 
    SCHEMABINDING
AS
BEGIN
RETURN
(
    SELECT
        STRING_AGG
        (
            CONVERT
            (
                nvarchar(MAX),
                b2.Name
            ), 
            N', '
        )
        WITHIN GROUP 
        (
            ORDER BY 
                b2.Name
        )
    FROM
    (
        SELECT DISTINCT 
            b.Name
        FROM dbo.Badges AS b
        WHERE b.UserId = @UserId
    ) AS b2
);
END;
GO

Not exactly a thing of beauty, is it? Let’s hold onto that for one second, though.

XML > JSON

Okay, so let’s get back to that UDF documentation.

The UDF does not reference XML methods

That sort of reads like NOXML4U, which is my friend Jeremiah’s license plate. In reality though, it means all the .method thingies, like value, node, query, etc.

So if you were to try to follow my prior advice on string concatenation, the function couldn’t be inlined.

SELECT
   x = 
   (
       SELECT 
           [text()] = 
               b.Name
       FROM dbo.Badges AS b
       WHERE b.Id = 100564
       FOR XML 
           PATH(''),
           TYPE
   ).value
     (
         './text()[1]', 
         'nvarchar(max)'
     );

Having the .value breaks inlining. So there’s that. I believe this is where Reddit users post something like le sigh with some poorly drawn stick figure.

Del The Funktion

We can write the function semi-correctly like so:

CREATE OR ALTER FUNCTION 
    dbo.NotStringAgg (@UserId int)
RETURNS 
    nvarchar(max)
WITH 
    RETURNS NULL ON NULL INPUT, 
    SCHEMABINDING
AS
BEGIN
RETURN
    STUFF
    (
        (
            SELECT
                N', ' + 
                b.Name
            FROM dbo.Badges AS b
            WHERE b.UserId = @UserId
            GROUP BY b.Name
            ORDER BY b.Name
            FOR XML PATH (N'')
        ), 
        1,
        2,
        N''
    );
END;
GO

Alright, now let’s BRING THAT CODE BACK (air horn)

Sea Vessel Soliloquy

If we compare the execution plans for these two functions, the XML one gets inlined, and the STRING_AGG one does not.

SELECT TOP (5000)
    u.DisplayName,
    ThanksForAllYourHardWork = 
        dbo.NotStringAgg(u.Id)
FROM dbo.Users AS u
ORDER BY u.Reputation DESC;
GO 

SELECT TOP (5000)
    u.DisplayName,
    ThanksForAllYourHardWork = 
        dbo.IsStringAgg(u.Id)
FROM dbo.Users AS u
ORDER BY u.Reputation DESC;
GO

Here’s the plan for the inlined function:

Here’s the plan for the non-inlined function:

Stay Thirsty

The inlined function finishes about twice as fast, though one may pause for a moment to consider whether the 400ms difference is an earth shattering kaboom in this case.

Of course, the real kicker is when scalar UDFs are invoked as part of larger queries where parallelism, etc. is important.

Thanks for reading!

Going Further

Why Reporting Tables Aren’t A Great Idea In SQL Server Databases

Posted on September 30, 2021May 16, 2022 by Erik Darling

Headway

I’ve seen a lot of crazy ways for people to run reports on live data, but one of the worst ideas is creating a reporting table based on the set of data a user wants, and then letting them query that data.

Here’s why:

You still have to run the query to populate them, which hits the server with load you’re adverse to
The buffer pool gets taken up by all these reporting tables, which only one person uses
It’s hard to index the reporting tables, and column store isn’t useful on Standard Edition
They get stale quickly, and refreshing them means re-running the original query, or
A dynamic Merge statement, because no two tables will have the same columns
Generic reporting tables will often use the wrong data types for columns and be very wide

Put all that together, and you’ve got a pretty bad solution for your server and your users.

Money Talks

I’m gonna say that there are better ways to do this, and I’m gonna be right, but they’re all gonna take work and/or money.

The ones that take money are reporting off a copy of the data, which you can do with Availability Groups, Replication, or Log Shipping. You have to fully license the other server that you’re reading from no matter which you choose.

I have no interest in deep-diving into any one of those here, but I’m gonna generalize a little bit to make sure you’re not totally in the dark:

Availability Groups are easy to set up, sometimes complicated to manage and patch, and require databases in full recovery model, but give you pretty up-to-date data on the secondary for users to get at.
Replication is a lot like the above, except it makes me way more nervous. Snapshot Replication gets bonus points for allowing you to create different indexes on the subscriber, which AGs and Log Shipping don’t let you do. The freshness of the data does depend on which type of Replication you choose, so choose wisely.
Log Shipping just ships transaction log backups, which is pretty dead simple unless you need to deal with servers that can’t directly see each other for whatever reason. Simple is nice, but simple means you’re gonna have to stale data, since you have to kick users out to restore more transaction logs.

Manifesto

The solutions that require work are usually ones that you implement locally. I’m not going to talk about data warehouses here, because it’s a lot like the stuff up there, just with ETL in the middle. Or ELT. Whatever.

Getting reporting working from live data usually takes a few steps, depending on your situation:

Nonclustered column store indexes if you’re on Enterprise Edition
Indexed views if you’re on Standard Edition (I’ll talk more about this in a minute)
Enabling snapshot isolation just for reporting queries to not get blocked
Tuning the hell out of your report code to not beat the server up too bad

Indexed views have a lot of rules, and if you’re lucky you can get some mileage out of them to speed up aggregations on Standard Edition. You can try using column store, but it’s really severely hobbled on Standard Edition. The DOP two limit is especially a killer in a lot of cases.

All of this is work though, and it at least partially assumes your server can come anywhere near being able to handle the additional workload. If it’s just limping along now, you probably don’t wanna keep throwing straw at Joe Camel.

Out There

What I left out a little bit are tables that everyone can report from, because every time I’ve seen them implemented, it’s been an absolute mess of triggers or periodic data synchronizations that left users unhappy.

Either because their transactional workload slowed way down to keep things up to date synchronously, or because the background process that started out running every 10 minutes beat the crap out of the server every 10 minutes, so now it runs every hour but there’s another Agent job to kill that job if it runs more than five minutes because that means something bad is about to happen.

This is why most of the time, I’m happy to do the work to report off the live data.

Thanks for reading!

Going Further

Measuring Logical And Physical Reads In SQL Server Queries

Posted on September 29, 2021May 16, 2022 by Erik Darling

In Process

One question I get quite a bit is how much I look queries that do a lot of logical reads, and the answer is: not very often.

The primary metric that I look at is CPU, both total and average. If you look at that, you’ll most likely also find the type of queries that do a lot of reads because they’re missing a better index somewhere along the way

If queries run really fast, I don’t really care what they’re doing for logical reads, which are all from pages in memory anyway.

It may be useful to look at in parameter sniffing scenarios, because a query might sometimes do a lot of reads, and sometimes not. But at the same time, that query will also sometimes use a lot of CPU, and sometimes not.

And uh, no, I don’t look at it as a sign that I have to defragment indexes. At least not based on what index maintenance scripts measure.

Phys Ed

Queries — and servers in general — that have a high number of physical reads are a different story.

Looking at these, sure, you might find queries that are missing an index, but most likely what you’re going to find is a server that’s a bit malnourished in the RAM department. Physical reads come from disk, and if your queries are constantly piling these up, then you’re gonna have some stuff to look at.

Do you have anywhere near enough memory to cache your most frequently accessed objects in memory?
Do you have queries coming along asking for enormous memory grants that clear out swaths of the buffer pool?
Do you have too many unnecessary indexes competing for space in the buffer pool and knocking each other out?

Figuring that stuff out can be tough, but can be made easier with a good monitoring tool.

Literacy

One thing that will separate servers where queries do a lot of logical reads from servers that do a lot of physical reads is wait stats.

Servers where queries do a lot of reading pages from disk into memory — physical reads — are going to have higher waits on PAGEIOLATCH_SH and PAGEIOLATCH_EX.

Thanks for reading!

Going Further

Steps For Getting Rid Of NOLOCK Hints In SQL Server Queries

Posted on September 24, 2021May 16, 2022 by Erik Darling

Way Out

Whenever I see people using NOLOCK hints, I try to point out that they’re not a great idea, for various reasons explained in detail all across the internet.

At minimum, I want them to understand that the hint name is the same as setting the entire transaction isolation level to READ UNCOMMITTED, and that the hint name is quite misleading. It doesn’t mean your query takes no locks, it means your query ignores locks taken by other queries.

That’s how you can end up getting incorrect results.

That warning often comes with a lot of questions about how to fix blocking problems so you can get rid of those hints.

After all, if you get rid of them, your SQL Server queries will (most likely) go back to using the READ COMMITTED isolation level and we all know that read committed is a garbage isolation level, anyway.

Cause and Wrecked

An important thing to understand is why the hint was used in the first place. I’ve worked with some nice developers who slapped it on every query just in case.

There was no blocking or deadlocking. They just always used it, and never stopped.

Not that I blame them; the blocking that can occur under read committed the garbage isolation level is plain stupid, and no respectable database platform should use it as a default.

In many ways, it’s easier for a user to re-run a query and hopefully get the right result and shrug and mumble something about computers being awful, which is also true.

So, first step: ask yourself if there was ever really a blocking problem to begin with.

Bing Tutsby

Next, we need to understand where the blocking was coming from. Under read committed the garbage isolation level, writers can block readers, and readers can block writers.

In most cases though, people have added the hint to all of their queries, even ones that never participated in blocking.

If the problem was writers blocking writers, no isolation can help you.
If the problem was readers blocking writers, you may need to look at long running queries with Key Lookups

If the problem was writers blocking readers, you’d have to look at a few things:

Are you modification queries biting off more than they should chew?
Do they have good indexes in place to locate rows to update or delete?

If you have query store enabled, you can use sp_QuickieStore to search it for queries that do a lot of writes. If you don’t, you can use sp_BlitzCache to search the plan cache for them.

Best Case

Of course, you can avoid all of these problems, except for writers blocking writers, by using an optimistic isolation level like Read Committed Snapshot Isolation or Snapshot Isolation.

In the past, people made a lot of fuss about turning these on, because

You may not have tempdb configured correctly
You have queue type code that relied on blocking for correctness

But in reasonably new versions of SQL Server, tempdb’s setup is part of the install process, and the wacky trace flags you used to have to turn on are the default behavior.

If you do have code in your application that processes queues and relies on locking to correctly process them, you’re better off using locking hints in that code, and using an optimistic isolation level for the rest of your queries. This may also be true of triggers that are used to enforce referential integrity, which would need READCOMMITTEDLOCK hints.

The reason why they’re a much better choice than using uncommitted isolation levels is because rather than get a bunch of dirty reads from in-flight changes, you read the last known good version of the row before a modification started.

This may not be perfect, but it will prevent the absolute majority of your blocking headaches. It will even prevent deadlocks between readers and writers.

No, Lock

If your code has a lot of either NOLOCK hints or READ UNCOMITTED usage, you should absolutely be worried about incorrect results.

There are much better ways to deal with blocking, and I’ve outlined some of them in this post.

Thanks for reading!

Going Further

Signs It’s Time To Switch From ORMs To Stored Procedures In SQL Server Development

Posted on September 23, 2021May 16, 2022 by Erik Darling

Meet Wall

There are, unfortunately, some times when developers refuse to put the ORM down.

I mean, it’s mostly unfortunate for them, because they’ll continue to have performance problems.

Me? I’ll be okay.

The items in this post are issues I’ve run into constantly when working with people who use ORMs, but don’t spend any time looking at the queries they generate.

I expect this list to grow. Heck, maybe it’ll even get some good comments that I can add to the list.

I know, I know. Good comments.

Unreasonable

Here’s the stuff I see quite frequently causing issues with ORM code, in no particular order, and they’re quite often signs that you’d be better off with a stored procedure.

1. Your select list doesn’t fit in the cached plan

Developers using ORMs often have to learn the hard way that when they poke an object, all the columns come out. Not only does the lesson seem hard to learn, the behavior seems hard to change. I’ve worked with people months and years apart and found the same bad habits over and over again, and this isn’t an exception. Not only can this hurt query performance for a number of reasons, but it also makes reproducing any issues really difficult because you can’t get the full query text easily.

2. Your queries generates long IN lists

It’s bad enough that most ORMs don’t deal gracefully with this by parameterizing the IN clause values. Even if you do parameterize them all, SQL Server might have different ideas about how best to apply that IN clause. Scroll down to the “Complexity” section in this blog post. You can get wildly different plans depending on how many search arguments you pass in. This is one of those times where a table valued parameter, temp table, or other materialization of the list is a way better idea.

3. You don’t understand the query it generates

Because you-know-who, did you-know-what, with you-know-who, but let’s keep that between me and you isn’t a good way to send queries to SQL Server. The number of far overly-complicated queries that generate unrecognizable logic that I’ve seen have, at this point, probably generated physical weight in the world beyond mere electrons. The 20 lines of code you wrote to explain what you want have turned into a 200 line query full of derived left joins to every single imaginable relation in the database. Why? I don’t know. You don’t know either.

4. You can’t get a good query plan

You see that paragraph up there? That’s why you can’t get a good query plan. The optimizer spent a reasonable amount of time assessing all of the crazy semantic needs of your query and came up with a reasonable plan as quickly as possible. But somewhere along the line, it misjudged something, or didn’t have time to explore that one last join reordering that would have made everything okay. Or maybe, like a query that generates a long IN clause, this monster would benefit from breaking the million-layer-dip of logic up by dumping some initial results into a #temp table.

5. You can’t get developers to fully parameterize queries

When you write queries that take parameters, whether it’s a stored procedure or dynamic SQL, you get better plan re-use. When you throw literal values into the mix, the optimizer is far less charitable, and will treat each query like it has never seen it before and go about compiling a brand new execution plan for it. Even if you turn on Forced Parameterization, your semi-parameterized queries won’t be parameterized. And of course, Optimize For Ad Hoc Workloads won’t help once the plan moves beyond stub-status.

Move On

There are many ways to tune a query, but unfortunately a good portion of them are unavailable while using ORMs in their natural code-only state. Sure, you can write custom queries in the code, but that has a lot of potential downsides, too. Depending on how the code is constructed, and if parameters are strongly typed, you may not get consistent plan re-use.

I’m all for application developers using tooling that enables them to work on new features quickly and in a way that they’re comfortable with. But at some point, SQL developers or DBAs need to step in and enforce coding standards. At some point, mom and dad have to the keys away and implement something that performs beyond just what “works”.

Thanks for reading!

Going Further

I/O Bound vs CPU Bound Queries In SQL Server

Posted on September 21, 2021May 16, 2022 by Erik Darling

Handcuffed

When you’re looking for queries to tune, it’s important to understand which part is causing the slowdown.

That’s why Actual Execution plans are so valuable in newer versions of SQL Server and SSMS. Getting to see operator timing and wait stats for a query can tell you a lot about what kind of problem you’re facing.

Let’s take a look at some examples.

Diskord

If you’re looking at a query plan, and all the time is spent way to the right, when you’re reading from indexes, it’s usually a sign of one or two things:

You’re missing a good index for the query
You don’t have enough memory to cache data relevant to the query

If you run the query a second time, and it has the same characteristics — meaning we should now have the data cached in the buffer pool but we don’t — then one or both of those two things is true.

If you run the query a second time and it’s fast because all the data we care about is cached, then it’s more likely that only the second thing is true.

For example, every time I run this query it takes 20 seconds because every time it has to read the clustered index from disk into memory to get a count. That’s because my VM has 96 GB of RAM, and the clustered index of the Posts table is about 120 GB. I can’t fit the whole thing into the buffer pool, so each time I run this query has the gas station sushi effect on the buffer pool.

If I add a nonclustered index — and keep in mind I don’t really condone adding single key column nonclustered indexes like this — the query finishes much faster, because the smaller nonclustered index takes less time to read, and it fits into the buffer pool.

CREATE INDEX pr ON dbo.Posts
(
    Id
);

If our query had different characteristics, like a where clause, join, group by, order by, or windowing function, I’d consider all of those things for the index definition. Just grabbing a count can still benefit from a smaller index, but there’s nothing relational that we need to account for here.

Proc Rock

Let’s say you already have ideal indexes for a query, but it’s still slow. Then what?

There are lots of possible reasons, but we’re going to examine what a CPU bound query looks like. A good example is one that needs to process a lot of rows, though not necessarily return a lot of rows, like a count or other aggregate.

While this query runs, CPUs are pegged like suburban husbands.

For queries of this stature, inducing batch mode is often the most logical choice. Why? Because CPU instructions are run over batches of rows at once, rather than a single row at a time.

With a small number of rows — like in an OLTP workload — you probably won’t notice any real gains. But for this query that takes many millions of rows and produces an aggregate, it’s Hammer Time™

Rather than ~30 seconds, we can get our query down to ~8 seconds without making a single other change to the indexes or written form.

Under Compression

For truly large data sets, compression indexes is a great choice for further reducing I/O bound portions of queries. In SQL Server, you have row, page, and column store (clustered and nonclustered) compression available to you based on the type of workload you’re running.

When you’re tuning a query, it’s important to keep the type of bottleneck you’re facing in mind. If you don’t, you can end up trying to solve the wrong problem and getting nowhere.

Thanks for reading!

Going Further

Why You Should Stop Looking At Query Costs In SQL Server

Posted on August 10, 2021May 16, 2022 by Erik Darling

Spare No Expense

Over the years, there have been a lot of requests to get sp_BlitzCache to sort results by query cost. I understand why. It’s assumed that the optimizer is never wrong and that cost is directly associated with poor performance.

There are also rather misguided efforts to figure out parallelism settings based on plan costs. The main problem with that being that if you currently have a lot of parallel queries, all that means is that the estimated cost of the serial plan was higher than your current Cost Threshold For Parallelism setting, and the cost of the parallel plan was less than the cost of the serial plan.

If you increase Cost Threshold For Parallelism, you may very well still end up with a parallel plan, because the serial version was still more expensive. If you eventually change Cost Threshold For Parallelism to the point where some queries are no longer eligible for parallelism, you may eventually find yourself unhappy with the performance of the serial version of the query plan.

Albeit with less overall wait time on CX* doodads.

Next you’ll be complaining about all the SOS_SCHEDULER_YIELD waits you’ve got.

Insteads

Rather than look at estimated metrics, you should be looking at how queries actually perform. For most servers I look at, that means looking at queries with high average CPU time, and large memory grants. Those metrics typically represent tunable aspects of the query.

In other cases, you might look at wait stats to direct the type of queries you want to go after. Reads, writes, and executions are also valuable metrics at times.

One danger of looking at totals rather than averages is that you may find things that do a little bit of something a whole lot of times, and there’s no real way to tune the small bit of activity they generate other than to run the query less.

What’s A Cost For?

In general, I only tend to look at costs to figure out plan choices within a query, or when comparing two different plans for “the same” query.

This is where experimenting with hints to change the plan shapes and choices can show you why you got the plan you did, and what you might have to do to get the plan you want naturally.

Let’s say you want to figure out why you got a specific join type. You hint the type of join you want, and there’s a missing index request now. Adding the index gets you the plan shape you want without the hint. Everyone lived happily ever after.

Until the index got fragmented ???

Thanks for reading!

Going Further

Why CASE Expressions Are Bad For Query Performance In SQL Server Join And Where Clauses

Posted on August 9, 2021May 16, 2022 by Erik Darling

Off And On

I spend a lot of time fixing queries like this, so I figured that I’d write about it in hopes that I can save someone some time and pain later on.

Obviously, this extends to join clauses as well. A case expression there has just as much chance of causing performance issues.

This pattern crops up in some ORM queries that I’ve seen as well, but I haven’t been able to figure out the code that causes it.

Let’s look at why this causes problems!

Tractor Trailer

To give our query the best possible chance of not sucking, let’s create some indexes.

CREATE NONCLUSTERED INDEX p
ON dbo.Posts 
(
    PostTypeId,
    Score, 
    OwnerUserId
);

CREATE NONCLUSTERED INDEX u
ON dbo.Users 
(
    Reputation
)
INCLUDE 
(
    DisplayName
);

With those in place, let’s look at a simple example query.

SELECT
    u.Id,
    u.DisplayName,
    s = SUM(p.Score)
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE 1 = 
    CASE 
        WHEN p.PostTypeId = 1
        AND  p.Score >= 10
        THEN 1
        ELSE 0
    END
GROUP BY
    u.Id,
    u.DisplayName
HAVING SUM(p.Score) > 10000
ORDER BY s DESC;

The plan reveals textbook symptoms of a lack of SARGability: an index scan with a predicate, despite a perfectly seekable index being in place:

Shame about that! But we can make things worse, too.

The Worser

If we involve a new column in the case expression, this time from the Users table, the predicate will be applied at a really unfortunate place in the query plan.

SELECT
    u.Id,
    u.DisplayName,
    s = SUM(p.Score)
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE 1 = 
    CASE 
        WHEN p.PostTypeId = 1
        AND  p.Score >= 10
        AND  u.Reputation > 5000
        THEN 1
        ELSE 0
    END
GROUP BY
    u.Id,
    u.DisplayName
HAVING SUM(p.Score) > 10000
ORDER BY s DESC;

Now all the filtering happens at the join, and the query goes from taking about 1 second to taking about 5 seconds.

If you write queries like this, you’re asking for trouble.

Why Can’t You Just Be Normal?

If we express that logic without a case expression, performance turns out much better. Shocking, right?

SELECT
    u.Id,
    u.DisplayName,
    s = SUM(p.Score)
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score >= 10
AND   u.Reputation > 5000
GROUP BY
    u.Id,
    u.DisplayName
HAVING SUM(p.Score) > 10000
ORDER BY s DESC;

This query takes about 240 milliseconds, which is a nice improvement.

We’re able to seek into our Super Helpful Index™ on the Posts table. Now I know what you’re thinking — we’re hitting the clustered index on the Users table — that must be horrible.

But no; because the Nested Loops Join is of the Apply variety, it makes more sense to use it to seek to a single Id, and evaluate the predicate on Reputation second.

Thanks for reading!