Performance Issues With EXISTS Queries In SQL Server

Dos Puntos


Look, I really like EXISTS and NOT EXISTS. I do. They solve a lot of problems.

This post isn’t a criticism of them at all, nor do I want you to stop using them. I would encourage you to use them more, probably.

But there’s some stuff you need to be aware of when you use them, whether it’s in control-flow logic, or in queries.

If you keep your head about you, you’ll do just fine.

IF EXISTS


The issue you can hit here is one of row goals. And a T-SQL implementation shortcoming.

If I run this query, it’ll chug along for about 10 seconds.

IF EXISTS
(
    SELECT 
       1/0
    FROM dbo.Posts AS p
    JOIN dbo.Votes AS v
        ON  p.Id = v.PostId
    WHERE v.VoteTypeId = 1
    AND   v.CreationDate >= '2018-12-01'
    AND   p.PostTypeId = 1
)
BEGIN
    SELECT x = 1;
END;

The part of the plan that we care about is a seek into the Votes table.

SQL Server Query Plan
eviction

SQL SERVER’S COST BASED OPTIMIZER™ thinks that 2.52 (rounded to 3) rows will have to get read to find data we care about, but it ends up having to do way more work than that.

It’s worth a short topic detour here to point out that when you’re tuning a slow query, paying attention to operator costs can be a real bad time. The reason this query is slow is because the costing was wrong and it shows. Costed correctly, you would not get this plan. You would not spend the majority of the query execution time executes in the lowest-costed-non-zero operator.

Normally, you could explore query hints to figure out why this plan was chosen, but you can’t do that in the context of an IF branch. That sucks, because a Hash Join hinted query finished in about 400ms. We could solve a problem with that hint, or if we disabled row goals for the query.

Fixing It


In order to tune this, we need to toggle with the logic a little bit. Rather than put a query in the IF EXISTS, we’re going to set a variable based on the query, and use the IF logic on that, instead.

DECLARE
    @do_it bit;

SELECT
    @do_it = 
    (
        SELECT 
            CONVERT
            (
                bit,
                ISNULL
                (
                    MAX(1), 
                	0
                )
            )
        FROM dbo.Posts AS p
        JOIN dbo.Votes AS v
            ON  p.Id = v.PostId
        WHERE v.VoteTypeId = 1
        AND   v.CreationDate >= '2018-12-01'
        AND   p.PostTypeId = 1
    )
OPTION(HASH JOIN);

IF @do_it = 1
BEGIN
    SELECT x = 1;
END;

This produces the fast plan that we’re after. You can’t use a CASE expression here and get a hash join though, for reasons explained in this post by Pablo Blanco.

But here it is. Beautiful hash join.

SQL Server Query Plan
blown

EXISTS With OR Predicates


A common query pattern is to is EXISTS… OR EXISTS to sort out different things, but you can end up with a weird optimizer query rewrite (SplitSemiApplyUnionAll) that looks a lot like the LEFT JOIN… IS NULL pattern for finding rows that don’t exist. Which is generally a bad pattern, as discussed in the linked post.

Anyhoo.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.Reputation > 1000000
AND   EXISTS
      (
          SELECT
              1/0
          FROM dbo.Badges AS b
          WHERE b.UserId = u.Id
      )
OR    EXISTS
      (
          SELECT
              1/0
          FROM dbo.Comments AS c
          WHERE c.UserId = u.Id
      );

This is what I’m talking about, in the plan for this query.

SQL Server Query Plan
made for the movies

Rather than do two semi joins here for the EXISTS, we get two right outer joins. That means (like in the linked post above), all rows between tables are joined, and filters are applied much later on in the plan. You can see one of the right outer joins, along with the filters (on expressions!) in the nice picture up there.

Fixing It


The fix here, of course (of course!) is to write the query in a way that the optimizer can’t apply that foolishness to.

SELECT
    c = SUM(x.c)
FROM 
(
    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    WHERE u.Reputation > 1000000
    AND   EXISTS
          (
              SELECT
                  1/0
              FROM dbo.Badges AS b
              WHERE b.UserId = u.Id
          )
    
    UNION ALL
    
    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    WHERE EXISTS
          (
              SELECT
                  1/0
              FROM dbo.Comments AS c
              WHERE c.UserId = u.Id
          )
) AS x;

This query completes in around 1.5 seconds, compared to 4.9 seconds above.

SQL Server Query Plan
explored

Seasoned Veteran


It’s rough when you run into these problems, because solutions aren’t always obvious (obvious!), nor is the problem.

Most of the posts I write about query tuning arise from issues I solve for clients. While most performance problems come from predictable places, sometimes you have to deal with edge cases like this, where the optimizer mis-costs things.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

The Right Way To Check For NULLs In SQL Server Queries

101ers


This is still one of the most common problems I see in queries.

People are terrified of NULLs. People are afraid to merge on freeways in Los Angeles.

What results is this endless stream of poorly performing queries, and some surprise logic bugs along the way.

I don’t have much more else of an intro. The TL;DR is that you should use natural expressions like IS NULL or IS NOT NULL, rather than any of the built in functions available to you in SQL Server, like ISNULL, COALESCE, et al. which are presentation layer functions with no relational meaning whatsoever.

From here on out, we’ll be calling them unnatural expressions. Perhaps that will get through to you.

Tuning Wizard


First is something I’ve covered before, but when you use unnatural expressions, the optimizer will not give you feedback about useful indexes.

SQL Server Query Plan
tenting

The first query generates a missing index request, the second one does not. The optimizer has abandoned all hope with the use of an unnatural expression.

Lethargy


The other issue with unnatural expressions comes down to implicit conversion.

Take this, for instance.

DECLARE 
    @i int = 0;

SELECT 
    c = 
        CASE ISNULL(@i, '') 
             WHEN '' 
             THEN 1 
             ELSE 0 
        END;

This will return a 1, because 0 and ” can be implicitly converted.

Perhaps less obvious, and more rare, is this:

DECLARE 
    @d datetime = '19000101';

SELECT 
    c = 
        CASE ISNULL(@d, '') 
             WHEN '' 
             THEN 1 
             ELSE 0 
        END;

Which will also return 1.

Not many databases have stuff going back to 1900, but I do see people using that as a canary value often enough.

Perfidy


If that’s not enough to get you off the idea, let’s look at how this stuff plays out in the real world.

First, let’s get ourselves an index. Without that, there’s fundamentally no difference in performance.

CREATE INDEX v ON dbo.Votes
    (BountyAmount);

Our gold standard will be these two queries:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.BountyAmount IS NULL;

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.BountyAmount IS NOT NULL;

The first one that checks for NULL values returns a count of 182,348,084.

The second one that checks for NOT NULL values returns a count of 344,070.

Keep those in mind!

The query plans for them both look like this:

SQL Server Query Plan
jumbotron

Which run, respectively (and respectably), in 846ms and 26ms. Obviously the query with the more selective predicate will have a time advantage, here.

Wrongly


Here’s where things start to go wrong.

This query returns incorrect results, but you’re probably used to that because of all the NOLOCK hints in your queries anyway.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE ISNULL(v.BountyAmount, '') = '';

A count of 182,349,088 is returned rather than 182,348,084, because there are 1004 rows with a bounty of 0.

Even though we have an empty string in our query, it’s implicitly converted to 0.

SQL Server Query Plan
checked

And you thought you were so clever.

Badly


The exercises in futility that I see people carrying on with often look make use of ISNULL, COALESCE, and CASE expressions.

It’s worth noting here that COALESCE is only a CASE expression underneath, anyway. They are interchangeable in this respect.

For findings NULLs, people will screw up and do this:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE ISNULL(v.BountyAmount, -1) = -1;

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE COALESCE(v.BountyAmount, -1) = -1;

We can use -1 here because it doesn’t naturally occur in the data. Results are correct for both, but performance is comparatively horrible.

SQL Server Query Plan
up high

We’re looking at 2.5 seconds compared to 900ms. This situation gets worse with the more selective predicates, too.

SQL Server Query Plan
down low

These both take roughly the same time as the other unnatural forms of this query, but recall the natural version of this query finished in under 30ms.

Deadly


I hope I don’t have to write about this anymore, but at the rate I see people doing this stuff, I kind of doubt it.

Broken Record Enterprises, it feels like sometimes.

I’m not sure why anyone thinks this is a good idea. I’ve heard rumors that it comes from application developers who are used to NULLs throwing errors writing SQL queries, where they don’t pose the same threat.

Who knows, though. Maybe people just really like the festive pink text color that functions turn in SSMS.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Dealing With Wide Missing Index Requests In SQL Server

We’ve All Been There


You’re running a query that selects a lot of columns, and you get a missing index request.

For the sake of brevity, let’s say it’s a query like this:

SELECT
    p.*
FROM dbo.Posts AS p
WHERE p.ParentId = 0;

The missing index request I get for this query is about like so:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Posts] ([ParentId])
INCLUDE ([AcceptedAnswerId],[AnswerCount],[Body],[ClosedDate],[CommentCount],[CommunityOwnedDate],
[CreationDate],[FavoriteCount],[LastActivityDate],[LastEditDate],[LastEditorDisplayName],[LastEditorUserId],
[OwnerUserId],[PostTypeId],[Score],[Tags],[Title],[ViewCount])

But that’s laughable, because it’s essentially a shadow clustered index. It’s every column in the table ordered by <some column>.

And Again


Under many circumstances, you can trim all those included columns off and make sure there’s a usable index with ParentId as the leading column.

I’m not a fan of single key column indexes most of the time, so I’d avoid that practice.

But sure, if you have reasonably selective predicates, you’ll get a decent seek + lookup plan. That’s not always going to be the case, though, and for various reasons you may end up getting a poor-enough estimate on a reasonably selective predicate, which will result in a bad-enough plan.

Of course, other times you may not have very selective predicates at all. Take that query up there, for example. There are 17,142,169 rows in the Posts table (2013), and 6,050,820 of them qualify for our predicate on ParentId.

This isn’t a case where I’d go after a filtered index, either, because it’d only be useful for this one query. And it’d still be really wide.

There are four string columns in there, all nvarchar.

  • Title (250)
  • Tags (150)
  • LastEditorDisplayName(40)
  • Body(max)

Maybe Something Different


If I’m going to create an index like that, I want more out of it than I could get with the one that the optimizer asked for.

On a decently recent version of SQL Server (preferably Enterprise Edition), I’d probably opt for creating a nonclustered column store index here.

You get a lot of benefits from that, which you wouldn’t get from the row store index.

  • Column independence for searching
  • High compression ratio
  • Batch Mode execution

That means you can use the index for better searching on other predicates that aren’t terribly selective, the data source is smaller and less likely to be I/O bound, and batch mode is aces for queries that process a lot of rows.

Column store indexes still have some weird limitations and restrictions. Especially around data types and included columns, I don’t quite understand why there isn’t better parity between clustered and nonclustered column store.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How To Use sp_HumanEvents To Troubleshoot A Slow Stored Procedure In SQL Server

Number One Record


You’ve got a stored procedure that runs slowly, but the problem is that it’s… long. It’s really hard to tell which part is slow, sometimes.

With shorter procedures you can probably just collect actual execution plans and slam F5 like a tall glass of gin at 6am.

But you don’t wanna do that with the larger procedures, for a few practical reasons:

  • Lots of little queries run quickly, and we don’t care about those
  • Navigating through lots of plans in SSMS is tedious
  • There’s no differentiation when other procedures, etc. are invoked
  • You introduce a lot of overhead retrieving and rendering all those plans
  • The full query text might not be captured, which is a limitation in many places

Let’s save the day with sp_HumanEvents, my stored procedure that makes using Extended Events really easy.

Wanna Ride


There are a lot of issues you can run into with Extended Events. They’re rather unpleasant, and there’s almost zero guidance from Microsoft about usage.

Wouldn’t it be nice to just hit F5?

Once you have a window open with your procedure ready to run, take note of the session id that it’s using, and tailor this command for your situation:

EXEC sp_HumanEvents
    @event_type = 'query',                   
    @query_duration_ms = 1000,               
    @session_id = N'58',                    
    @keep_alive = 1;

Let’s break down what this does:

  • The events we want to focus on are for executed queries
  • We only care about queries that run for more than one second
  • We’re going to focus in on the session id for the query window we’re tuning in
  • We want the session to stay running so we can watch and re-watch after we make changes

I know it looks a little funny that @session_id is a string, but that’s because there are some sampling capabilities if you want to look at a workload instead of a single procedure without overwhelming a server by collecting everything.

Once that’s done, you’ll have a new Extended Event session like this, and you’ll wanna watch live data from it:

SQL Server Extended Events
big bux

Watch Out Now


Once you’ve got that window open and you run your procedure, you’ll see any queries that meet the duration criteria, and you should see something that looks like this.

SQL Server Extended Events
good for you

 

The procedure that I’m looking at in here is sp_BlitzCache, because it’s a good example of a procedure with a lot of queries in it, where only some of them (like the XML parsing) might get slowed down.

Some notes on the output:

  • The statement doesn’t get collected with the query plan (more on that in a second)
  • INSERT…EXEC shows up as two statements (lines 2 and 3 over there)

You only see these two columns at first, but you can go through and add in any other columns that you find useful from each specific session type.

Two Point Two


The output can be a little confusing at first. Generally the pattern is query plan, then statement associated with it.

If you click on each event, you can choose different elements from it to show in the main result window.

SQL Server Extended Events
like you

Here’s how I usually set things up to find what I wanna go after:

SQL Server Extended Events
get it

I like to grab CPU, duration, information about requested, granted, and used memory, and the statement.

Not every element is available in every event. Where things are NULL, that’s where they’re not available. I wish there were a general extended event that captured everything I need in one go, but that doesn’t seem to exist.

One last point on the output is that if you click on the query_post_execution_showplan lines, the query plan is available in the second result set:

SQL Server Query Plan
one time

Crossover


Before you ask, the lightweight profiling events are useless. They’re so nerfed and collect so little helpful detail that you might as well be looking at a cached plan or Query Store plan.

Lightweight, baby 🙄

To that point, I don’t recommend running something like this across a whole workload. Though Extended Events are purported to have far less observer overhead than Profiler, I’ve seen even heavily-filtered sessions like this slow workloads down quite a bit.

That goes for other traces and forms of monitoring as well. Don’t think this is a special case.

Anyway, this is how I work when I’m trying to tune things for clients. Hopefully you find it useful as well.

Grab sp_HumanEvents and lemme know how it goes on GitHub.

Thanks for reading.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

An Undocumented Restriction For Parallel Inserts In SQL Server That Can Kill Performance

Insert Card


I’d like to start this post off by thanking my co-blogger Joe Obbish for being lazy and not blogging about this when he first ran into it three years ago.

Now that we’re through with pleasantries, let’s talk turkey.

Over in this post, by Arvind Shyamsundar, which I’m sure Microsoft doesn’t consider official documentation since it lacks a GUID in the URL, there’s a list of… things about parallel inserts.

  • Just as it is with SQL Server 2016, in order to utilize the parallel insert in Azure SQL DB, do ensure that your compatibility level is set to 130. In addition, it is recommended to use a suitable SKU from the Premium service tier to ensure that the I/O and CPU requirements of parallel insert are satisfied.
  • The usage of any scalar UDFs in the SELECT query will prevent the usage of parallelism. While usage of non-inlined UDFs are in general ‘considered harmful’ they end up actually ‘blocking’ usage of this new feature.
  • Presence of triggers on the target table and / or indexed views which reference this table will prevent parallel insert.
  • If the SET ROWCOUNT clause is enabled for the session, then we cannot use parallel insert.
  • If the OUTPUT clause is specified in the INSERT…SELECT statement to return results to the client, then parallel plans are disabled in general, including INSERTs. If the OUTPUT…INTO clause is specified to insert into another table, then parallelism is used for the primary table, and not used for the target of the OUTPUT…INTO clause.
  • Parallel INSERT is used only when inserting into a heap without any additional non-clustered indexes. It is also used when inserting into a Columnstore index.
  • Watch out when IDENTITY or SEQUENCE is present!

In the actual post, some of these points are spread out a bit; I’ve editorially condensed them here. Some of them, like OUTPUT and UDFs, I’ve blogged about a bazillion times over here.

Others may come as a surprise, like well, the rest of them. Hm.

But there’s something missing from here, too!

Lonesome


Let’s create a #temp table, here.

DROP TABLE IF EXISTS 
    #parallel_insert;

CREATE TABLE
    #parallel_insert
(
    id int NOT NULL
)

Now let’s look at a parallel insert. I’m using an auxiliary Numbers table for this demo because whatever it’s my demo.

INSERT
    #parallel_insert WITH (TABLOCK)
(
    id
)
SELECT
    n.Number
FROM dbo.Numbers AS n
JOIN dbo.Numbers AS n2
    ON n2.Number = n.Number
OPTION
(
    USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE')
);

The query plan does exactly what we want it to do, and stays parallel through the insert.

SQL Server Query Plan
heavy bags

Referential


If we drop and re-create the #temp table, and then run this insert instead, that doesn’t happen:

INSERT
    #parallel_insert WITH (TABLOCK)
(
    id
)
SELECT
    n.Number
FROM dbo.Numbers AS n
JOIN dbo.Numbers AS n2
    ON n2.Number = n.Number
AND NOT EXISTS
    (
        SELECT
            1/0
        FROM #parallel_insert AS p
        WHERE p.id = n.Number
    )
OPTION
(
    USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE')
);

Note that the not exists is against an empty table, and will not eliminate any rows. The optimizer estimates this correctly, and yet…

SQL Server Query Plan
collect call

The insert is in a fully serial zone. This happens because we reference the table that we’re inserting into in the select portion of the query. It’s no longer just the target for the select to insert into.

Zone Out


If you’re tuning queries like this and hit situations where this limitation kicks in, you may need to use another #temp table to stage rows first.

I’m not complaining about this limitation. I can only imagine how difficult it would be to guarantee correct results in these situations.

I do want to point out that the fully parallel insert finishes in around 250ms, and the serial zone insert finishes in 1.4 seconds. This likely isn’t the most damning thing, but in larger examples the difference can be far more profound.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Inline Table Valued Functions And Catch All Queries In SQL Server

This Is Not A Solution


Right off the bat, I want you to know that this is not a solution, and I’ll get to why in a minute. I’m writing this mainly because every once in a while I’ll try something different to get this working, and it always ends up disappointing.

I wish I had better news for you, here. Hell, I wish I had better news for me here. But alas we’re at the mercy of parameters.

And yeah, I know, recompile, recompile, recompile. All the live long day. But I’ve seen some weird stuff happen with that too under high concurrency.

So what’s the point? Let’s talk about that.

Dot Dot Dot


CREATE INDEX p1 ON dbo.Posts(OwnerUserId, CreationDate);

CREATE INDEX p2 ON dbo.Posts(Score, LastActivityDate);

We need some indexes. That’s a fact. I’m intentionally creating them in this way to show you that SQL Server can sometimes be smart about catch all queries.

And here’s the inline table valued function we’ll be working with:

CREATE OR ALTER FUNCTION
    dbo.kitchen_sink
(
    @OwnerUserId int,
    @CreationDate datetime,
    @Score int,
    @LastActivityDate datetime
)
RETURNS table
AS
RETURN
SELECT
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE 
        (p.OwnerUserId = @OwnerUserId OR @OwnerUserId IS NULL)
    AND (p.CreationDate >= @CreationDate OR @CreationDate IS NULL)
    AND (p.Score >= @Score OR @Score IS NULL)
    AND (p.LastActivityDate >= @LastActivityDate OR @LastActivityDate IS NULL);

This pattern usually eats the optimizer alive, and there’s a lot of posts about using dynamic SQL to fix it.

But when we call this function with literal values, it does just fine.

SELECT
    ks.c
FROM dbo.kitchen_sink(22656, '20130101', NULL, NULL) AS ks;

SELECT
    ks.c
FROM dbo.kitchen_sink(NULL, NULL, 100, '20130101') AS ks;

SELECT
    ks.c
FROM dbo.kitchen_sink(22656, NULL, NULL, '20130101') AS ks;

SELECT
    ks.c
FROM dbo.kitchen_sink(NULL, '20131225', NULL, '20131225') AS ks;

SELECT
    ks.c
FROM dbo.kitchen_sink(22656, NULL, NULL, '20131215') AS ks;

Das Plan


You can run those all yourself and look at the plans. I’m just gonna throw a couple of the more interesting examples in the post, though.

The first two queries do exactly what we’d hope to see.

SQL Server Query Plan
sparkling

We use the right indexes, we get seeks. Cardinality estimation is about as reliable as ever with the “””””default””””” estimator in place 🙄

And at one point, we even get a really smart index intersection plan where the optimizer uses both of our nonclustered indexes.

SQL Server Query Plan
units

Parameter Problem


The problem is that no one really makes database calls like that.

If you’re using an ORM, you could intentionally not parameterize your queries and get this to “work”, but there are downsides to that around the plan cache. Being honest, most plan caches are useless anyway.

Long Live Query Store, or something.

Most people have their catch all code parameterized, so the query looks like what’s in the function. I’m going to throw the function in a stored procedure now.

CREATE OR ALTER PROCEDURE
    dbo.kitchen_wrapper
(
    @OwnerUserId int,
    @CreationDate datetime,
    @Score int,
    @LastActivityDate datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT
        ks.c
    FROM dbo.kitchen_sink
    (
        @OwnerUserId, 
        @CreationDate, 
        @Score, 
        @LastActivityDate
    ) AS ks;

END;

If we execute the proc like this, everything goes to hell rather quickly.

EXEC dbo.kitchen_wrapper
    @OwnerUserId = 22656,          
    @CreationDate = '20131215',    
    @Score = NULL,                  
    @LastActivityDate = NULL;

EXEC dbo.kitchen_wrapper
    @OwnerUserId = NULL,
    @CreationDate = NULL, 
    @Score = 100, 
    @LastActivityDate = '20131215';

Baywatch


The first execution uses the “right” index, but we lose our nice index seek into the p1 index.

SQL Server Query Plan
barfbag

We also end up with Predicates on the Key Lookup, just in case they end up not being NULL. And boy, when they end up not being NULL, we end up with a really slow query.

SQL Server Query Plan
me one too

We re-use the execution plan we saw before, because that’s how SQL Server works. But since we don’t filter any rows from p1 since those parameters are NULL now, we pass all 17 million rows to the key lookup to filter them there, but since it’s a Nested Loops Join, we do it… one row at a time.

Fun.

Floss Too Much


There’s no great fix for this, either. This is a problem we’re stuck with when we write queries this way without using dynamic SQL, or a recompile hint.

I’ve seen people try all sorts of things to “fix” this problem. Case expressions, ISNULL and COALESCE, magic values, and more. They all have this exact same issue.

And I know, recompile, recompile, recompile.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

A SARGability Riddle In SQL Server: Why Do Some Queries Seek and Some Queries Scan?

Use The Force


I had a RABID FAN ask me an interesting question about a query. I can’t use theirs, but I can repro the question.

The question was: if using ISNULL in a where clause isn’t SARGable, how come I can use a FORCESEEK hint in this query?

Formatting and linking my own, of course.

The query looked something like this:

SELECT
    c = 
        COUNT_BIG(*)
FROM dbo.Users AS u WITH(FORCESEEK)
WHERE ISNULL(u.Id, 0) = 22656;
GO
SQL Server Query Plan
seekable!

What Gives?


The first thing you should notice is that the optimizer throws out ISNULL, here.

Why? Because the Id column isn’t NULL-able, and since ISNULL is Microsoft’s special non-ANSI baby, it can do this one special thing.

If we use COALESCE instead, we’ll get an error.

SELECT
    c = 
        COUNT_BIG(*)
FROM dbo.Users AS u WITH(FORCESEEK)
WHERE COALESCE(u.Id, 0) = 22656;
GO

Msg 8622, Level 16, State 1, Line 8
Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

And if we try to use ISNULL on a NULL-able column like Age, we’ll get the same error:

SELECT
    c = 
        COUNT_BIG(*)
FROM dbo.Users AS u WITH(FORCESEEK)
WHERE ISNULL(u.Age, 0) = 22656;
GO

Coacase? Caselesce?


Under the covers, COALESCE is just a crappy band CASE expression.

Without the FORCESEEK hint, we can get the query to actually run.

SELECT
    c = 
        COUNT_BIG(*)
FROM dbo.Users AS u
WHERE COALESCE(u.Id, 0) = 22656;
GO 
SQL Server Query Plan
southa

And ISNULL is just… ISNULL.

SELECT
    c = 
        COUNT_BIG(*)
FROM dbo.Users AS u
WHERE ISNULL(u.Age, 0) = 22656;
GO
SQL Server Query Plan
ribs

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

You Probably Don’t Need To Offload SQL Server Reads To Another Server

Moneybags


Duplicating data for reporting, outside of moving it to a data warehouse where there’s some transformations involved, can be an expensive and perilous task.

Your options come down to a native solution like AGs, Replication, or Log Shipping. You can brew something up yourself that relies on native stuff too, like Change Data Capture, Change Tracking, Temporal Tables, or triggers.

Outside of that, you’re getting into the world of SSIS/ADF, other third party vendors, etc. to get data where it ought to go. That’s none of my business, and good luck to you.

As soon as you let people read that data, you have to fully license the SQL Server that it’s sitting on.

Alt Rock


I’ve talked to a few dozen people about this over the years, too. Most were shocked to learn about the licensing implications, thinking that having Software Assurance from Microsoft covered their needs.

The most frequent desire for offloading reads is real-time reporting, and the most frequently cited reason is that reporting queries put too much load on the current server hardware.

You can see where the conflict sits, here. People think they’re getting a free SQL Server to report off of, so they don’t need to up their current hardware and pay for it. Microsoft may be dumb, but it ain’t that dumb.

Once you get people past the licensing issues, tuning the current hardware becomes a more approachable subject.

Gestalt Block


Of course, the idea of tuning the reporting queries has occurred to most people, but the attempts at tuning are usually flailing attempts to solve problems that aren’t there.

  • Maybe this local variable will do a thing for parameter sniffing
  • NOLOCK is better for reports, anyway
  • Don’t forget to recompile, too
  • Add a DISTINCT, there’s too many rows
  • Throw up hands, rebuild indexes

While everything has its place, it’s rare that this combination will get you past all of your performance issues. Reporting queries are especially interesting to me, because of how few people embrace batch mode to any degree for them.

The cost of tuning queries is far cheaper than the cost of licensing a whole other server, especially for Enterprise Edition customers. If you’re on Standard Edition, you have many other problems. Many, many other problems, standing in the way of scaling and performance issues.

If this is the kind of thing you need help with, drop me a line. This is exactly the kind of thing I do.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

An Annoying OPTIMIZE FOR Limitation In SQL Server

Infrequent


I have occasionally cheated a little and used OPTIMIZE FOR some_value to fix a parameter sniffing issue that didn’t have any other viable options available to it.

This is a pretty rare situation, but there’s a place for everything. Keep in mind that I’m not talking about UNKNOWN here. I’m talking about a real value.

Recently I had to fix a specific problem where cardinality estimates for datetime values would get completely screwed up if they were older than a day.

You’d be tempted to call this an ascending key problem, but it was really an ascending key solution. Whenever a query got an off histogram estimate, it chose a good plan — when it got a histogram step hit, the estimate was high by several million rows, and the plan looked like someone asked for all the rows in all the databases in all the world.

So, you go through the usual troubleshooting steps:

  • More frequent stats updates: uh oh, lots of recompiles
  • Stats updates with fullscan during maintenance: crapped out during the day
  • Various trace flags and acts of God: Had the opposite effect
  • Is my query doing anything dumb? Nope.
  • Are my indexes eating crayons? Nope.

Drawing Board


The problem with OPTIMIZE FOR is that… it’s picky. You can’t just optimize for anything.

For example, you can’t do this:

DECLARE 
    @s datetime = '19000101'''

SELECT 
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.CreationDate >= @s
OPTION
(
    OPTIMIZE FOR (@s = GETDATE())
);

And you can’t do this:

DECLARE 
    @s datetime = '19000101',
    @d datetime = GETDATE()

SELECT 
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.CreationDate >= @s
OPTION
(
    OPTIMIZE FOR (@s = @d)
);

We get a nasty message.

Msg 320, Level 15, State 1, Line 26
The compile-time variable value for ‘@s’ in the OPTIMIZE FOR clause must be a literal.

Ever Heard Of A Chef Who Can’t Cook?


The solution is, as usual, dynamic SQL, but there’s a catch. Because there’s always a catch.

For example this works, but if you run it a minute or so apart, you get multiple plans in the cache.

DBCC FREEPROCCACHE;

DECLARE
    @sql nvarchar(MAX) = N'',
    @s datetime = '19000101',
    @d datetime = GETDATE(),
    @i int = 0;

WHILE @i < 10
BEGIN
    
    SELECT
       @sql = N'
        SELECT 
            c = COUNT_BIG(*)
        FROM dbo.Users AS u
        WHERE u.CreationDate >= @s
        OPTION
        (
            OPTIMIZE FOR (@s = ''' + CONVERT(nvarchar(30), @d) + ''')
        );     
       ';
    
    EXEC sys.sp_executesql
        @sql,
        N'@s datetime',
        @s;
    
    SELECT 
        @i += 1;
    
END

EXEC sp_BlitzCache 
    @DatabaseName = 'StackOverflow';
dangit

Are You Ready For Some Date Math?


Depending on how we want to address this, we can either:

  • Do some date math to go to the beginning of the current day
  • Do some date math to go to the end of the current day
  • Set the value to the furthest possible date in the future

The first two cases should generally be fine. Saying the quiet part out loud, not a lot of plans survive a long time, either due to plan cache instability or other recompilation events, like from stats updates.

If you’re super-picky about that, go with the third option. This may also be considered the safest option because a stats update might give you a histogram for today’s value. The datetime max value will theoretically “never” be a histogram step value, but that depends on if you let users do Stupid Things™

DBCC FREEPROCCACHE;

DECLARE
    @sql nvarchar(MAX) = N'',
    @s datetime = '19000101',
    @d datetime = '99991231',
    @i int = 0;

WHILE @i < 10
BEGIN
    
    SELECT
       @sql = N'
        SELECT 
            c = COUNT_BIG(*)
        FROM dbo.Users AS u
        WHERE u.CreationDate >= @s
        OPTION
        (
            OPTIMIZE FOR (@s = ''' + CONVERT(nvarchar(30), @d) + ''')
        );     
       ';
    
    EXEC sys.sp_executesql
        @sql,
        N'@s datetime',
        @s;
    
    SELECT 
        @i += 1;
    
END

No matter how many times you run this, the plan will get reused and you’ll always have the off-histogram step.

Qualifying Events?


This is one of those “good ideas” I have for a specific circumstance without any other easy workarounds. I don’t suggest it as a general practice, and it certainly has some drawbacks that would make it dangerous in other circumstances.

I can’t easily reproduce the problem this solved locally, but I can show you why you probably don’t want to make it a habit.

SQL Server Query Plan
eek

In cases where you are able to get good cardinality estimates, this will hamper it greatly.

So, you know, last resort.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Be Careful Where You Call Inline Table Valued Functions In SQL Server Queries

Keep’em Comin!


While helping someone tame a bunch of rather unfortunate scalar valued functions, we eventually hit a point where they were able to do some of the rewrites themselves. During testing, they ran into a situation where performance got worse when they made the switch over, and it wasn’t because an Eager Index Spool popped up.

I was able to come up with a demo that shows a reasonable enough performance difference in a couple queries using the same technique as I gave them to fix things.

So uh, here goes that.

Muppet


The query they were tuning had a couple OUTER APPLYs already written into it, and so they added the function on as another.

SELECT
    u.DisplayName,
    b.Name,
    nbi.AvgPostsPerDay
FROM dbo.Users AS u
OUTER APPLY
(
    SELECT 
        CreationDate = 
            MIN(p.CreationDate)
    FROM dbo.Posts AS p
    JOIN dbo.Votes AS v
        ON  v.PostId = p.Id
        AND v.VoteTypeId = 1
    WHERE p.OwnerUserId = u.Id
) AS p
OUTER APPLY
(
    SELECT TOP (1)
        b.Name
    FROM dbo.Badges AS b
    WHERE b.UserId = u.Id
    ORDER BY b.Date DESC
) AS b
OUTER APPLY dbo.no_bueno_inline(u.Id, p.CreationDate) AS nbi
WHERE u.Reputation >= 100000
ORDER BY u.Reputation DESC;

Since they didn’t want to lose rows to the function, they couldn’t use CROSS APPLY. Good enough.

Moutarde


But what they really wanted was to move the function up into the select list, like this:

SELECT
    u.DisplayName,
    b.Name,
    AvgPostsPerDay = 
    (
        SELECT 
            nbi.AvgPostsPerDay 
        FROM dbo.no_bueno_inline
        (
            u.Id, 
            p.CreationDate
        ) AS nbi
    )
FROM dbo.Users AS u
OUTER APPLY
(
    SELECT 
        CreationDate = 
            MIN(p.CreationDate)
    FROM dbo.Posts AS p
    JOIN dbo.Votes AS v
        ON  v.PostId = p.Id
        AND v.VoteTypeId = 1
    WHERE p.OwnerUserId = u.Id
) AS p
OUTER APPLY
(
    SELECT TOP (1)
        b.Name
    FROM dbo.Badges AS b
    WHERE b.UserId = u.Id
    ORDER BY b.Date DESC
) AS b
WHERE u.Reputation >= 100000
ORDER BY u.Reputation DESC;

That way you don’t lose any rows like you could with CROSS APPLY, and the optimizer is more likely to holler at the function later on in the query plan, since the values from it are only being projected — that’s fancy for selected.

Mapperoo


The full query plan is a bit much to untangle quickly in this post, but the timing difference is noticeable enough for my purposes:

SQL Server Query Plan
tootin

So if you ever end up rewriting a scalar valued function as an inline table valued function, make sure you test calling it in the same way. Moving query syntax around may produce logically equivalent results, but won’t always produce equivalent performance.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.