Things SQL Server vNext Should Address: Common Table Expression Materialization

Repetition Is Everything


I know what you’re thinking: this is another post that asks for a hint to materialize CTEs.

You’re wrong. I don’t want another hint that I can’t add to queries to solve a problem because the code is coming from a vendor or ORM.

No, I want the optimizer to smarten up about this sort of thing, detect CTE re-use, and use one of the New And Improved Spools™ to cache results.

Let’s take a look at where this would come in handy.

Standalone


If we take this query by itself and look at the execution plan, it conveniently shows one access of Posts and Users, and a single join between the two.

SELECT
    u.Id AS UserId,
    u.DisplayName,
    p.Id AS PostId,
    p.AcceptedAnswerId,
    TotalScore = 
        SUM(p.Score)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id
WHERE u.Reputation > 100000
GROUP BY 
    u.Id, 
    u.DisplayName,
    p.Id, 
    p.AcceptedAnswerId
HAVING SUM(p.Score) > 1
SQL Server Query Plan
invitational

Now, let’s go MAKE THINGS MORE READABLE!!!

Ality


WITH spool_me AS
(
    SELECT
        u.Id AS UserId,
        u.DisplayName,
        p.Id AS PostId,
        p.AcceptedAnswerId,
        TotalScore = 
            SUM(p.Score)
    FROM dbo.Users AS u
    JOIN dbo.Posts AS p
        ON p.OwnerUserId = u.Id
    WHERE u.Reputation > 100000
    GROUP BY 
        u.Id, 
        u.DisplayName,
        p.Id, 
        p.AcceptedAnswerId
    HAVING SUM(p.Score) > 1
)
SELECT
    a.UserId,
    a.DisplayName,
    a.PostId,
    a.AcceptedAnswerId,
    a.TotalScore,
    q.UserId,
    q.DisplayName,
    q.PostId,
    q.AcceptedAnswerId,
    q.TotalScore
FROM spool_me AS a
JOIN spool_me AS q
    ON a.PostId = q.AcceptedAnswerId
ORDER BY a.TotalScore DESC;

Wowee. We really done did it. But now what does the query plan look like?

SQL Server Query Plan
oh, you

There are now two accesses of Posts and two accesses of Users, and three joins (one Hash Join isn’t in the screen cap).

Detection


Obviously, the optimizer knows it has to build a query plan that reflects the CTE being joined.

Since it’s smart enough to do that, it should be smart enough to use a Spool to cache things and prevent the additional accesses.

Comparatively, using a #temp table to simulate a Spool, is about twice as fast. Here’s the CTE plan:

SQL Server Query Plan
double

Here’s the Spool Simulator Plan™

SQL Server Query Plan
professionals

Given the optimizer’s penchant for spools, this would be another chance for it to shine on like the crazy diamond it is.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Things SQL Server vNext Should Address: Local Variable Estimates

And Then The World


I’ve written at length about what local variables do to queries, so I’m not going to go into it again here.

What I do want to talk about are better alternatives to what you currently have to do to fix issues:

  • RECOMPILE the query
  • Pass the local variable to a stored procedure
  • Pass the local variable to dynamic SQL

It’s not that I hate those options, they’re just tedious. Sometimes I’d like the benefit of recompiling with local variables without all the other strings that come attached to recompiling.

Hint Me Baby One More Time


Since I’m told people rely on this behavior to fix certain problems, you would probably need a few different places to and ways to alter this behavior:

  • Database level setting
  • Query Hint
  • Variable declaration

Database level settings are great for workloads you can’t alter, either because the queries come out of a black box, or you use an ORM and queries… come out of a nuclear disaster area.

Query hints are great if you want all local variables to be treated like parameters. But you may not want that all the time. I mean, look: you all do wacky things and you’re stuck in your ways. I’m not kink shaming here, but facts are facts.

You have to draw the line somewhere and that somewhere is always “furries”.

And then local variables.

It may also be useful to allow local variables to be declared with a special property that will allow the optimizer to treat them like parameters. Something like this would be easy enough:

DECLARE @p int PARAMETER = 1;

Hog Ground


Given that in SQL Server 2019 table variables got deferred compilation, I think this feature is doable.

Of course, it’s doable today if you’ve got a debugger and don’t mind editing memory space.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

When Should You Use Table Variables In SQL Server? When Queries Execute Thousands Of Times A Minute

Screech



Links:

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

USE StackOverflow2013;
EXEC dbo.DropIndexes;
SET NOCOUNT ON;
DBCC FREEPROCCACHE;
GO 

CREATE INDEX whatever 
ON dbo.Posts
    (OwnerUserId) 
INCLUDE
    (Score);
GO 

CREATE OR ALTER PROCEDURE dbo.TempTableTest (@Id INT)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;
    
    CREATE TABLE #t(i INT NOT NULL);

    INSERT 
        #t ( i )
    SELECT 
        p.Score
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = @Id;

END;
GO 

CREATE OR ALTER PROCEDURE dbo.TableVariableTest (@Id INT)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;
    
    DECLARE @t TABLE (i INT NOT NULL);

    INSERT 
        @t ( i )
    SELECT 
        p.Score
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = @Id;

END;
GO 


CREATE OR ALTER PROCEDURE dbo.TempTestWrapper (@TestProc sysname)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

DECLARE @rando int = 
    ((ABS(CHECKSUM(NEWID())) % 21195018) + 1); /*this is the max id in posts for SO2013*/

IF @TestProc = N'TempTableTest'
    BEGIN
        EXEC dbo.TempTableTest @rando;
    END;

IF @TestProc = N'TableVariableTest'
    BEGIN
        EXEC dbo.TableVariableTest @rando;
    END;

END;
GO 

/*Testing*/
EXEC dbo.TempTestWrapper @TestProc = 'TempTableTest';
EXEC dbo.TempTestWrapper @TestProc = 'TableVariableTest';

/*

ostress -SSQL2019S -d"StackOverflow2013" -Q"EXEC dbo.TempTestWrapper @TestProc = 'TempTableTest';" -U"ostress" -P"ostress" -q -n300 -r1000 -o"C:\temp\crap"

ostress -SSQL2019S -d"StackOverflow2013" -Q"EXEC dbo.TempTestWrapper @TestProc = 'TableVariableTest';" -U"ostress" -P"ostress" -q -n300 -r1000 -o"C:\temp\crap"

*/

SQL Server Spool Operators Are Just Crappy Temp Tables

But My Tempdb


Using the scenario from yesterday’s post as an example of why you might want to think about rewriting queries with Table Spools in them to use temp tables instead, look how the optimizer chooses a plan with an Eager Table Spool.

The “Eager” part means the entire set of rows is loaded into a temporary object at once.

SQL Server Query Plan
drugas

That’s a lot of rows, innit? Stick some commas in there, and you might just find yourself staring down the barrel of a nine digit number.

Worse, we spend a long time loading data into the spool, and doing so in a serial zone. There’s no good way to know exactly how long the load is because of odd operator times.

If you recall yesterday’s post, the plan never goes back to parallel after that, either. It runs for nearly 30 minutes in total.

Yes Your Tempdb


If you’re gonna be using that hunka chunka tempdb anyway, you might as well use it efficiently. Unless batch mode is an option for you, either as Batch Mode On Rowstore, or tricking the optimizer, this might be your best bet.

Keep in mind that Standard Edition users have an additional limitation where Batch Mode queries are limited to a DOP of 2, and don’t have access to Batch Mode On Rowstore as of this writing. The DOP limitation especially might make the trick unproductive compared to alternatives that allow for MOREDOP.

For example, if we dump that initial join into a temp table, it only takes about a minute to get loaded at a DOP of 8. That is faster than loading data into the spool (I mean, probably. Just look at that thing.).

SQL Server Query Plan
sweet valley high

The final query to do the distinct aggregations takes about 34 seconds.

SQL Server Query
lellarap

Another benefit is that each branch that does a distinct aggregation is largely in a parallel zone until the global aggregate.

SQL Server Query
muggers

In total, both queries finish in about a 1:45. A big improvement from nearly 30 minutes relying on the Eager Table Spool and processing all of the distinct aggregates in a serial zone. The temp table here doesn’t have that particular shortcoming.

In the past, I’ve talked a lot about Eager Index Spools. They have a lot of problems too, many of which are worse. Of course, we need indexes to fix those, not temp tables.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SELECT
    v.PostId,
    v.UserId,
    v.BountyAmount,
    v.VoteTypeId,
    v.CreationDate
INTO #better_spool
FROM dbo.Votes AS v
JOIN dbo.Posts AS p
    ON p.Id = v.PostId;

SELECT
    PostId = COUNT_BIG(DISTINCT s.PostId),
    UserId = COUNT_BIG(DISTINCT s.UserId), 
    BountyAmount = COUNT_BIG(DISTINCT s.BountyAmount), 
    VoteTypeId = COUNT_BIG(DISTINCT s.VoteTypeId), 
    CreationDate = COUNT_BIG(DISTINCT s.CreationDate)
FROM #better_spool AS s;

 

What’s Really Different About In-Memory Table Variables In SQL Server?

Kendra, Kendra, Kendra


My dear friend Kendra asked… Okay, look, I might have dreamed this. But I maybe dreamed that she asked what people’s Cost Threshold For Blogging™ is. Meaning, how many times do you have to get asked a question before you write about it.

I have now heard people talking and asking about in-memory table variables half a dozen times, so I guess here we are.

Talking about table variables.

In memory.

Yes, Have Some


First, yes, they do help relieve tempdb contention if you have code that executes under both high concurrency and frequency. And by high, I mean REALLY HIGH.

Like, Snoop Dogg high.

Because you can’t get rid of in memory stuff, I’m creating a separate database to test in.

Here’s how I’m doing it!

CREATE DATABASE trash;

ALTER DATABASE trash 
ADD FILEGROUP trashy 
    CONTAINS MEMORY_OPTIMIZED_DATA ;
     
ALTER DATABASE trash 
ADD FILE 
(
    NAME=trashcan, 
    FILENAME='D:\SQL2019\maggots'
) 
TO FILEGROUP trashy;

USE trash;

CREATE TYPE PostThing 
AS TABLE
(
    OwnerUserId int,
    Score int,
    INDEX o HASH(OwnerUserId)
    WITH(BUCKET_COUNT = 100)
) WITH
(
    MEMORY_OPTIMIZED = ON
);
GO

Here’s how I’m testing things:

CREATE OR ALTER PROCEDURE dbo.TableVariableTest(@Id INT)
AS
BEGIN

    SET NOCOUNT, XACT_ABORT ON;
    
    DECLARE @t AS PostThing;
    DECLARE @i INT;

    INSERT @t 
        ( OwnerUserId, Score )
    SELECT 
        p.OwnerUserId,
        p.Score
    FROM Crap.dbo.Posts AS p
    WHERE p.OwnerUserId = @Id;

    SELECT 
        @i = SUM(t.Score)
    FROM @t AS t
    WHERE t.OwnerUserId = 22656
    GROUP BY t.OwnerUserId;

    SELECT 
        @i = SUM(t.Score)
    FROM @t AS t
    GROUP BY t.OwnerUserId;

END;
GO

Hot Suet


So like, the first thing I did was use SQL Query Stress to run this on a bunch of threads, and I didn’t see any tempdb contention.

So that’s cool. But now you have a bunch of stuff taking up space in memory. Precious memory. Do you have enough memory for all this?

Marinate on that.

Well, okay. Surely they must improve on all of the issues with table variables in some other way:

  • Modifications can’t go parallel
  • Bad estimates
  • No column level stats

But, nope. No they don’t. It’s the same crap.

Minus the tempdb contetion.

Plus taking up space in memory.

But 2019


SQL Server 2019 does offer the same table level cardinality estimate for in-memory table variables as regular table variables.

If we flip database compatibility levels to 150, deferred compilation kicks in. Great. Are you on SQL Server 2019? Are you using compatibility level 150?

Don’t get too excited.

Let’s give this a test run in compat level 140:

DECLARE @i INT = 22656;
EXEC dbo.TableVariableTest @Id = @i;
SQL Server Query Plan
everything counts in large amounts

Switching over to compat level 150:

SQL Server Query Plan
yeaaahhhhh

Candy Girl


So what do memory optimized table variables solve?

Not the problem that table variables in general cause.

They do help you avoid tempdb contention, but you trade that off for them taking up space in memory.

Precious memory.

Do you have enough memory?

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Tracking Row Changes With Temporal Columns In SQL Server

Extra, Extra


Temporal tables are cool, but keeping all that history can be stressful. Change Tracking adds overhead to every transaction, and requires Snapshot Isolation to be successful. Change Data Capture can also run into problems scanning the transaction log, depending on transaction volume, etc.

Change Data Capture is also a little unfortunate in that it doesn’t track schema changes like adding or dropping columns, or changing column data types. Change Tracking doesn’t either, it just tracks the keys of what changed when data is modified.

Temporal Tables are different, because it will add new, and drop removed columns from the history table as they change in the base table. It’ll also push data type changes across. But the history table will not have the various permutations tracked. It just mirrors current state.

Just The Columns, Ma’am


Let’s say your needs aren’t robust enough to need any one of those highly specialized features, or even triggers to move data around when it changes.

You’re perfectly free and able to add the tracking columns that temporal tables use to your base tables, but adding them is far from free. When I added them to the 17 million row Posts table, it took about 40 seconds. My laptop doesn’t suck, either.

To show you a little how it works, let’s create a copy of the Votes table from Stack Overflow.

CREATE TABLE dbo.Votes_Tracked
(
    Id int NOT NULL,
    PostId int NOT NULL,
    UserId int NULL,
    BountyAmount int NULL,
    VoteTypeId int NOT NULL,
    CreationDate datetime NOT NULL,
    LastModified datetime2 GENERATED ALWAYS AS ROW start NOT NULL
        CONSTRAINT DF_LastModified DEFAULT (SYSDATETIME()),
    JunkDate datetime2 GENERATED ALWAYS AS ROW END HIDDEN NOT NULL
        CONSTRAINT DF_JunkDate DEFAULT ('9999-12-31 23:59:59.9999999'),
    PERIOD FOR SYSTEM_TIME (LastModified, JunkDate),
    CONSTRAINT PK_Votes_Id
        PRIMARY KEY CLUSTERED (Id ASC)
);
GO

Note that you need two columns to define the “period for system time”, and one of them will always be useless. That’s why I called it JunkDate, and not, like, whatever. But the good news is you can define that column as HIDDEN so that it doesn’t show up in all your queries.

Now we can stick some data in there and see how it works.

INSERT 
    dbo.Votes_Tracked WITH (TABLOCK)
(
    Id,
    PostId,
    UserId,
    BountyAmount,
    VoteTypeId,
    CreationDate
)
SELECT
    Id,
    PostId,
    UserId,
    BountyAmount,
    VoteTypeId,
    CreationDate
FROM StackOverflow2013.dbo.Votes AS v
WHERE v.CreationDate >= '20130101';

Looking Around


The table data looks like this:

woah man

If we run an update:

UPDATE v
    SET v.BountyAmount = 9999
FROM dbo.Votes_Tracked AS v
WHERE v.VoteTypeId = 7;
temporally yours

Note that these screen caps were taken without the HIDDEN keyword added to the table definition: that was an after thought recommended by my friend Peter.

Whatabouts?


Of course, if you remove rows from the table, they’re just gone. You’d still need a trigger to cover deletes, if you need to track those.

And if you want to remove those columns later, it takes a little bit of tweaking.

ALTER TABLE dbo.Votes_Tracked DROP CONSTRAINT DF_LastModified, DF_JunkDate;
ALTER TABLE dbo.Votes_Tracked DROP COLUMN JunkDate, LastModified;

Msg 13588, Level 16, State 1, Line 63
Column 'JunkDate' in table 'Crap.dbo.Votes_Tracked' cannot be dropped because it is a part of period definition.

Of course, the table isn’t system versioned, so this command will also fail:

ALTER TABLE dbo.Votes_Tracked SET (SYSTEM_VERSIONING = OFF);

Msg 13591, Level 16, State 1, Line 66
SYSTEM_VERSIONING is not turned ON for table 'Crap.dbo.Votes_Tracked'.

If you want to remove them, you’ll need to use this:

ALTER TABLE dbo.Votes_Tracked DROP PERIOD FOR SYSTEM_TIME;

Now you’ll be able to remove them.

Is This A Good Idea?


Well, it depends on what you need. If you just need to know when a row changed, and you don’t need to know what changed or who changed it, it can work in a more automated way than triggers. It does require an additional column, which isn’t ideal, but it’s not a gigantic column, and you don’t need to worry about indexing it because the data is junk.

I haven’t found any ridiculous downsides to this, but I’ll keep this updated if I do.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Don’t Be Afraid Of tempdb In SQL Server

It’s There For A Reason


When tuning queries, one pattern I see over and over again is people running crazy-long queries. Maybe they worked well-enough at some point, but over the years they just kept getting slower and slower.

Sometimes there are comments, and other times there’s enough domain knowledge on the call to understand how a query ended up in the shape it’s in. One persistent idea is that tempdb is something to be avoided. Either because it was “slow” or to avoid contention.

Granted, if a query has been around long enough, these may have been valid concerns at some point. In general though, temp tables (the # kind, not the @ kind) can be quite useful when query tuning.

You Might Be Using It, Anyway


Even if there’s some rule against directly using temp tables, queries can end up using tempdb by the caseload anyway.

Consider that Spool operators explicitly execute in tempdb, any spills will go to tempdb, and work tables that are used in a number of circumstances occur in tempdb. The bigger and more complicated your queries are, the more likely you are to run into cases where the optimizer Spools, Spills, or use some other workspace area in tempdb in your query plan.

Worse, optimizations available for temp tables aren’t available to on-the-fly operators. You also lose the ability to take further action by indexing your temp tables, etc.

It’s Often Easier Than Other Options


Many times when tuning queries, I’ll be puzzled by the optimizer’s choices. Sometimes it’s join type, other times it’s join order, or something else. Perhaps the most common reason is some misestimation, of course.

Query and index hints are great to experiment with, but are often unsatisfying as permanent fixes. I’m not saying to never use them, but you should explore other options first. In other words, keep temp tables on the table.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Making The Most Of Temp Tables Part 4: Batch Mode

Le Big Zoom Zoom


When you have queries that need to process a lot of data, and probably do some aggregations over that lot-of-data, batch mode is usually the thing you want.

Originally introduced to accompany column store indexes, it works by allowing CPUs to apply instructions to up to 900 rows at a time.

It’s a great thing to have in your corner when you’re tuning queries that do a lot of work, especially if you find yourself dealing with pesky parallel exchanges.

Oh, Yeah


One way to get that to happen is to use a temp table with a column store index on it.

SELECT 
    v.UserId, 
    SUM(v.BountyAmount) AS SumBounty
FROM dbo.Comments AS c
JOIN dbo.Votes AS v
    ON  v.PostId = c.PostId
    AND v.UserId = c.UserId
GROUP BY v.UserId
ORDER BY SumBounty DESC;

CREATE TABLE #t(id INT, INDEX c CLUSTERED COLUMNSTORE);

SELECT 
    v.UserId, 
    SUM(v.BountyAmount) AS SumBounty
FROM dbo.Comments AS c
JOIN dbo.Votes AS v
    ON  v.PostId = c.PostId
    AND v.UserId = c.UserId
LEFT JOIN #t AS t 
    ON 1 = 0
GROUP BY v.UserId
ORDER BY SumBounty DESC;

Keep in mind, this trick won’t work if you’re on SQL Server 2019 and using in memory tempdb. But aside from that, you’re free to rock and roll with it.

If you end up using this enough, you may just wanna create a real table to use, anyway.

Remarkable!


If we look at the end (or beginning, depending on how you read your query plans) just to see the final times, there’s a pretty solid difference.

SQL Server Query Plan
you can’t make me

The first query takes around 10 seconds, and the second query takes around 4 seconds. That’s a pretty handsome improvement without touching anything else.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Making The Most Of Temp Tables Part 3: More Opportune Indexes To Make Queries Go Faster

I Know You


You have too many indexes on too many tables already, and the thought of adding more fills you with a dread that has a first, middle, last, and even a confirmation name.

This is another place where temp tables can save your bacon, because as soon as the query is done they basically disappear.

Forever. Goodbye.

Off to buy a pack of smokes.

That Yesterday


In yesterday’s post, we looked at how a temp table can help you materialize an expression that would otherwise be awkward to join on.

If we take that same query, we can see how using the temp table simplifies indexing.

SELECT
    p.OwnerUserId,
    SUM(p.Score) AS TotalScore,
    COUNT_BIG(*) AS records,
    CASE WHEN p.PostTypeId = 1 
         THEN p.OwnerUserId
         WHEN p.PostTypeId = 2
         THEN p.LastEditorUserId
    END AS JoinKey
INTO #Posts
FROM dbo.Posts AS p
WHERE p.PostTypeId IN (1, 2)
AND   p.Score > 100
GROUP BY CASE
             WHEN p.PostTypeId = 1 
             THEN p.OwnerUserId
             WHEN p.PostTypeId = 2 
             THEN p.LastEditorUserId
         END,
         p.OwnerUserId;

CREATE CLUSTERED INDEX c ON #Posts(JoinKey);

SELECT *
FROM #Posts AS p
WHERE EXISTS
(
    SELECT 1/0
    FROM dbo.Users AS u
    WHERE p.JoinKey = u.Id
);

Rather than have to worry about how to handle a bunch of columns across the where and join and select, we can just stick a clustered index on the one column we care about doing anything relational with to get the final result.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Making The Most Of SQL Server Temporary Tables Part 2: Materializing Expressions

Bad Data


A lot of the time when I see queries that are written with all sorts of gymnastics in the join or where clause and I ask some questions about it, people usually start complaining about the design of the table.

That’s fine, but when I ask about changing the design, everyone gets quiet. Normalizing tables, especially for Applications Of A Certain Age™ can be a tremendously painful project. This is why it’s worth it to get things right the first time. Simple!

Rather than make someone re-design their schema in front of me, often times a temp table is a good workaround.

Egg Splat


Let’s say we have a query that looks like this. Before you laugh, and you have every right to laugh, keep in mind that I see queries like this all the time.

They don’t have to be this weird to qualify. You can try this if you have functions like ISNULL, SUBSTRING, REPLACE, or whatever in joins and where clauses, too.

SELECT
    p.OwnerUserId,
    SUM(p.Score) AS TotalScore,
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON u.Id = CASE 
                   WHEN p.PostTypeId = 1 
                   THEN p.OwnerUserId
                   WHEN p.PostTypeId = 2
                   THEN p.LastEditorUserId
              END
WHERE p.PostTypeId IN (1, 2)
AND   p.Score > 100
GROUP BY p.OwnerUserId;

There’s not a great way to index for this, and sure, we could rewrite it as a UNION ALL, but then we’d have two queries to index for.

Sometimes getting people to add indexes is hard, too.

People are weird. All day weird.

Steak Splat


You can replace it with a query like this, which also allows you to index a single column in a temp table to do your correlation.

SELECT
    p.OwnerUserId,
    SUM(p.Score) AS TotalScore,
    COUNT_BIG(*) AS records,
    CASE WHEN p.PostTypeId = 1 
         THEN p.OwnerUserId
         WHEN p.PostTypeId = 2
         THEN p.LastEditorUserId
    END AS JoinKey
INTO #Posts
FROM dbo.Posts AS p
WHERE p.PostTypeId IN (1, 2)
AND   p.Score > 100
GROUP BY CASE
             WHEN p.PostTypeId = 1 
             THEN p.OwnerUserId
             WHEN p.PostTypeId = 2 
             THEN p.LastEditorUserId
         END,
         p.OwnerUserId;

SELECT *
FROM #Posts AS p
WHERE EXISTS
(
    SELECT 1/0
    FROM dbo.Users AS u
    WHERE p.JoinKey = u.Id
);

Remember that temp tables are like a second chance to get schema right. Don’t waste those precious chances.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.