The How To Write SQL Server Queries Correctly Cheat Sheet: Select List Subqueries

Flash Boy


I think subqueries in select lists are very neat things. You can use them to skip a lot of additional join logic, which can have all sorts of strange repercussions on query optimization, particularly if you use have to use left joins to avoid eliminating results.

Subqueries do have their limitations:

  • They can only return one row
  • They can only return one column

But used in the right way, they can be an excellent method to retrieve results without worrying about what kind of join you’re doing, and how the optimizer might try to rearrange it into the mix.

Since subqueries are in the select list, and can’t eliminate rows from results, you will most commonly see them incorporated late in query plans (much further to the left) as outer joins. The optimizer is generally smart enough to retrieve data for select list subqueries after as much other filtering that can have been applied is applied, so they can be evaluated for as few rows in the outer results as possible.

The most important thing you can do as a developer to make sure your select list subqueries are fast is to make sure you have good supporting indexes for them.

Well Done


Let’s start with this query:

SELECT
    u.Id,
    u.DisplayName,
    u.Reputation,
    TopQuestionScore =     
    (
        SELECT TOP (1)
            p.Score
        FROM dbo.Posts AS p
        WHERE p.PostTypeId = 1
        AND   p.OwnerUserId = u.Id
        ORDER BY
            p.Score DESC,
            p.Id DESC
    ),
    TopAnswerScore =     
    (
        SELECT TOP (1)
            p.Score
        FROM dbo.Posts AS p
        WHERE p.PostTypeId = 2
        AND   p.OwnerUserId = u.Id
        ORDER BY
            p.Score DESC,
            p.Id DESC
    ),
    TotalPosts =    
    (
        SELECT
            COUNT_BIG(*)
        FROM dbo.Posts AS p
        WHERE p.OwnerUserId = u.Id
        AND   p.PostTypeId IN (1, 2)
    )
FROM dbo.Users AS u
WHERE u.Reputation > 500000
ORDER BY
    u.Reputation DESC;

The goal is to find every User with a Reputation over 500,000, and then find their:

  • Top scoring question (with a unique tie-breaker on most recent post id)
  • Top scoring answer (with a unique tie-breaker on most recent post id)
  • Total questions and answers

You might look at this query with a deep sense of impending dread, wondering why we should make three trips to the Posts table to get this information. I totally get that.

But let’s say we have these indexes in place:

CREATE INDEX 
    u
ON dbo.Users 
    (Reputation, Id)
INCLUDE 
    (DisplayName)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

CREATE INDEX
    p
ON dbo.Posts
    (OwnerUserId, PostTypeId, Score)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

They have everything we need to support quick, navigational lookups.

Query Planner


The query plan for this arrangement looks like this, finishing in 23 milliseconds.

sql server query plan
all i need

If you write select list subqueries, and they’re terribly slow, there’s a very good chance that the indexes you have in place are not up to the job, particularly if you see Eager Index Spools in the query plan.

All of the time in the plan is spent in the final subquery, that counts the total number of questions and answers. But even that, at 23 milliseconds, is not worth heaving our chests over.

Three round trips are not at all a problem here, but let’s compare.

One Way Ticket


I’m not opposed to experimentation. After all, it’s a great way to learn, observe, and become enraged with the state of query optimization generally.

Here are two rewrites of the above query, to only make one trip to the Posts table. The first one uses a derived join, and the second uses apply. They’re both written to use outer joins, to match the semantics of the three subqueries:

/*
Join
*/
SELECT
    u.Id,
    u.DisplayName,
    u.Reputation,
    p.TopQuestionScore,
    p.TopAnswerScore,
    p.TotalPosts
FROM dbo.Users AS u
LEFT JOIN
(
    SELECT
        p.OwnerUserId,
        TopQuestionScore = 
            MAX(CASE WHEN p.PostTypeId = 1 THEN p.Score ELSE NULL END),
        TopAnswerScore = 
            MAX(CASE WHEN p.PostTypeId = 2 THEN p.Score ELSE NULL END),
        TotalPosts = 
            COUNT_BIG(*)
    FROM dbo.Posts AS p
    WHERE p.PostTypeId IN (1, 2)
    GROUP BY
        p.OwnerUserId
) AS p
  ON p.OwnerUserId = u.Id
WHERE u.Reputation > 500000
ORDER BY
    u.Reputation DESC;

/*
Apply
*/
SELECT
    u.Id,
    u.DisplayName,
    u.Reputation,
    p.TopQuestionScore,
    p.TopAnswerScore,
    p.TotalPosts
FROM dbo.Users AS u
OUTER APPLY
(
    SELECT
        p.OwnerUserId,
        TopQuestionScore = 
            MAX(CASE WHEN p.PostTypeId = 1 THEN p.Score ELSE NULL END),
        TopAnswerScore = 
            MAX(CASE WHEN p.PostTypeId = 2 THEN p.Score ELSE NULL END),
        TotalPosts = 
            COUNT_BIG(*)
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = u.Id 
    AND   p.PostTypeId IN (1, 2)
    GROUP BY
        p.OwnerUserId
) AS p
WHERE u.Reputation > 500000
ORDER BY
    u.Reputation DESC;

A somewhat brief digression here: Query rewrites to use specific syntax arrangements are often not available in ORMs. Many times while working with clients, we’ll stumble across a bushel of quite painful application-generated queries. I’ll show them useful rewrites to improve things, and we’ll all marvel together at how much better things are. I’ll suggest implementing the rewrite as a stored procedure, and all hell will break loose.

Please don’t be one of those developers. Understand the limitations of the technology stack you’re working with. Not everything produced by code is good.

Compare and Contrast


In this case, both of the attempts at rewrites result in identical query plans. The optimizer does a good job here, but both of the single-trip queries is about 2x slower than the original.

In this case, that difference will look and feel microscopic. And it is, mostly because I provided indexes so good that you could write this query any which way and it would work out pretty well.

sql server query plan
back of the van

While one round trip certainly felt more efficient than three, each trip from outer to inner side of the nested loops had a bit more work to do each time, and that added up.

It’s nothing consequential here, but you may run into plenty of situations where it’s far worse (or to be fair, far better).

For me, the real advantage of writing out the three separate subqueries is to better understand which one(s) do the most work, and might need additional work done to make them fast.

When you do everything all at once, you have no idea which piece is responsible for slowdowns. We know from the very first query plan that getting the full count does the most work, but that wouldn’t be obvious to me, you, or anyone else looking at the two query plans in this section.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Join me In Boston May 10 For A Full Day Of SQL Server Performance Tuning Training

Spring Training


This May, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of great content.

Get your tickets here for this event, taking place Friday, May 10th 2024 at the Microsoft Offices in Burlington.

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for this event!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

bit Obscene Episode 2: SQL Server Agent Jobs

bit Obscene Episode 2: SQL Server Agent Jobs


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

SQL Server Index Design: Getting Key Column Order Right

SQL Server Index Design: Getting Key Column Order Right


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

How To Evaluate Index Effectiveness While Tuning SQL Server Queries

How To Evaluate Index Effectiveness While Tuning SQL Server Queries


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

How To Tell If Lazy Table Spools Are Slowing Your SQL Server Queries Down, And Fix Them

How To Tell If Lazy Table Spools Are Slowing Your SQL Server Queries Down, And Fix Them


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Join me In Boston May 10 For A Full Day Of SQL Server Performance Tuning Training

Spring Training


This May, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of great content.

Get your tickets here for this event, taking place Friday, May 10th 2024 at the Microsoft Offices in Burlington.

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for this event!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Why Multiple Seeks Are Slower For Some SQL Server Queries

Why Multiple Seeks Are Slower For Some SQL Server Queries


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

The How To Write SQL Server Queries Correctly Cheat Sheet: EXISTS and NOT EXISTS

Selectively Numb


The sort of wonderful thing about SQL is that it has many directives which are fairly easy to decipher and use appropriately.

The sort of awful thing about SQL is that it has just as many rules that are somewhat selectively applied as the English language itself.

I have my gripes and grievances with some of the choices, of course, and so will you as you delve further into the language. A petty example is that I sort of wish that SQL used GET instead of SELECT for retrieving data.

Very few people go to the store to select milk, eggs, steak, butter, salt, pepper, and scotch. Most of us just go get it. But enough about breakfast.

Let’s talk about two of the most overlooked and undervalued facilities in the SQL language: EXISTS and NOT EXISTS. Perhaps they’d get more traction is they were called THERE and NOT THERE, but but it would be perilously easy to start confusing your WHERE and THERE clause.

Often besmirched by someone who read a blog post about MySQL in 1998 as “subqueries, which are slower than joins”, they’re two of the most useful things you can grasp and use regularly.

Though they are a bit like subqueries, the columns that you select in an EXISTS or NOT EXISTS subquery can’t be used in the outer query. You can put whatever you want in the select list, from * to 1 to COUNT to 1/0 to the entire contents of the King James Bible, and it will never end up making even the dimmest difference in the world. Likewise, adding DISTINCT, TOP, or any other row-limiting device will do absolutely nothing to change the query plan or performance.

Get over yourself.

Both EXISTS and NOT EXISTS already set a row goal of 1, because all either one has to do is determine if a single row is there or not, just with the logic reversed for each.

Your First Mistakes


Let’s say someone asks you to gin up a list of Users who have Posted anything at all, but whose Reputation still floats at the dreaded 1.

Your first instinct would likely be to write a query that looks like this.

SELECT 
    u.Id,
    u.DisplayName
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON u.Id = p.OwnerUserId
WHERE u.Reputation = 1
ORDER BY 
    u.Id;

But you’d quickly find yourself confronted by many duplicate row values, because that’s what one-to-many joins produce. Duplicates.

Your next move, tongue hanging out, sweating profusely, knuckles creaking, nearly paralyzed by the uncertainty of your continued human functioning, would be to do something like this:

SELECT DISTINCT 
    u.Id,
    u.DisplayName
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON u.Id = p.OwnerUserId
WHERE u.Reputation = 1
ORDER BY 
    u.Id;

And, sure, with a small enough data set, this is an okay choice. You passed the pop quiz, hot shot. Your instinct to use DISTINCT was not wholly incorrect, but there’s a better way.

But as you start dealing with larger and more imposing sets of data, DISTINCT will no longer cut it.

What EXISTS Does Different


While EXISTS will still use a join to match rows between tables, the semantics are quite a bit different. It can move on once it has determined that a row is either there or not there.

You don’t need to add DISTINCT, grouping, or anything else to get the results you wanted in the first place.

SELECT 
    u.Id,
    u.DisplayName
FROM  dbo.Users AS u
WHERE u.Reputation = 1
AND EXISTS 
( 
    SELECT 
        1/0 
    FROM dbo.Posts AS p 
    WHERE p.OwnerUserId = u.Id 
)
ORDER BY 
    u.Id;

Once EXISTS locates a match, it moves on to the next value from the outer side (in this case the Users table), and attempts to find a match. If no match is found, the row is discarded, which is common to inner joins.

Where a lot of developers get hung up at first is in assuming that EXISTS and NOT EXISTS work like IN or NOT in, and they miss the inner where clause to tell the database which rows should match.

I’ve seen a lot of EXISTS queries written, quite incorrectly, like this:

SELECT 
    u.Id,
    u.DisplayName
FROM  dbo.Users AS u
WHERE u.Reputation = 1
AND EXISTS 
( 
    SELECT 
       p.OwnerUserId 
    FROM dbo.Posts AS p
)
ORDER BY 
    u.Id;

Which will, of course, return absolutely everything. Don’t do this.

The column you select inside of the EXISTS subquery does not infer any sort of matching logic.

Like I said before, it’s essentially discarded by the optimizer.

Your Second Mistakes


No half-assed SQL tutorial is complete without showing you the wrong way to find non-matching rows between two tables.

It will undoubtedly look something like this:

SELECT 
   records = 
       COUNT_BIG(u.Id)
FROM dbo.Users AS u
LEFT JOIN dbo.Posts AS p
  ON u.Id = p.OwnerUserId
WHERE p.Id IS NULL;

It’s not that this pattern is never better, it’s just that it shouldn’t be your go-to for each and every query with this goal in mind.

You take two tables, you join them together, and you add a predicate to your where clause to find rows where an ordinarily not-NULL column returns NULLs.

The problem is that SQL Server’s query optimizer doesn’t contain any logic to turn this into the type of query plan that you’d get using NOT EXISTS instead.

You end up needing to fully join any tables involved together, and then later on use a filter to remove rows where no match was found. This can be incredibly inefficient, especially on large data sets.

One may even be dealing with “big data” when the follies of this paradigm become quite clear.

A generally better approach to writing this type of query is to tell the database you’re using exactly what you’re after and exactly what you expect:

SELECT 
    records = 
        COUNT_BIG(u.Id)
FROM dbo.Users AS u
WHERE NOT EXISTS 
( 
    SELECT 
        1/0 
    FROM dbo.Posts AS p 
    WHERE p.OwnerUserId = u.Id 
);

Your developer-life will be a whole lot less confusing and tiresome if you arm yourself with options and alternatives, which means you’ll have lots of mental energy left over to, like, learn 17 new frameworks and really impress your friends.

Think of the frameworks.

Gear Up


You should make good use of the EXISTS and NOT EXISTS patterns in your queries when you don’t require any rows from another table, and you only need to validate if something is there or not.

In cases where you need to get information from another table, joins are likely the most direct path to getting back the data you need.

But this all brings up an interesting question: what if you want to get back information in the select list without adding in join clauses, worrying about inner, outer, full, or cross, and wondering silently if one day things might go pear shaped.

We’ll talk about that in the next post, when we go over correlated subqueries.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Vote For My Idea: SSMS Right Click To Open Execution Plans In A New Tab

SSMS Right Click To Open Execution Plans In A New Tab


Vote for my great idea here.

Sort of odd, I have no idea how this idea ended up in the category that it’s in. I was viewing suggestions for SQL Server/SSMS, but when I went to submit this one, it ended up in “Microsoft Entra”.

Maybe I missed something. Maybe someone from Microsoft will be nice enough to move it to the right place. Maybe not. Right now, it lives where it lives.

I use SSMS, because my primary job is SQL Server analysis, performance tuning, and general server fixin’. It’s far and away the most competent tool for the job (sort of like me).

Also sort of like me, there’s not a lot of stiff competition out there 😘

One problem I run into regularly is when I’m tuning a query, and I want to keep one of the resulting execution plans available, so I can:

  1. Compare it after I make some other changes and run the query again
  2. Do some additional analysis without worrying about accidentally losing the plan

The only way to do that is to save the plan, stick the XML in another tool, or keep opening new query tabs to run things in, where I won’t lose the plan.

I think it would be a reasonable and helpful extension of the current set of right-click menu options to be able to open a query plan in a new tab.

ssms right click
just one more won’t hurt

Vote for my great idea here.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.