Query Plan – Page 4 – Darling Data

SQL Server Index Design: Getting Key Column Order Right

Posted on March 7, 2024February 17, 2024 by Erik Darling

SQL Server Index Design: Getting Key Column Order Right

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

How To Evaluate Index Effectiveness While Tuning SQL Server Queries

Posted on March 6, 2024February 17, 2024 by Erik Darling

How To Evaluate Index Effectiveness While Tuning SQL Server Queries

Going Further

How To Tell If Lazy Table Spools Are Slowing Your SQL Server Queries Down, And Fix Them

Posted on March 5, 2024February 17, 2024 by Erik Darling

How To Tell If Lazy Table Spools Are Slowing Your SQL Server Queries Down, And Fix Them

Going Further

Join me In Boston May 10 For A Full Day Of SQL Server Performance Tuning Training

Posted on March 4, 2024February 16, 2024 by Erik Darling

Spring Training

This May, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of great content.

Get your tickets here for this event, taking place Friday, May 10th 2024 at the Microsoft Offices in Burlington.

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for this event!

Going Further

Why Multiple Seeks Are Slower For Some SQL Server Queries

Posted on March 1, 2024February 16, 2024 by Erik Darling

Why Multiple Seeks Are Slower For Some SQL Server Queries

Going Further

The How To Write SQL Server Queries Correctly Cheat Sheet: EXISTS and NOT EXISTS

Posted on February 29, 2024March 1, 2024 by Erik Darling

Selectively Numb

The sort of wonderful thing about SQL is that it has many directives which are fairly easy to decipher and use appropriately.

The sort of awful thing about SQL is that it has just as many rules that are somewhat selectively applied as the English language itself.

I have my gripes and grievances with some of the choices, of course, and so will you as you delve further into the language. A petty example is that I sort of wish that SQL used GET instead of SELECT for retrieving data.

Very few people go to the store to select milk, eggs, steak, butter, salt, pepper, and scotch. Most of us just go get it. But enough about breakfast.

Let’s talk about two of the most overlooked and undervalued facilities in the SQL language: EXISTS and NOT EXISTS. Perhaps they’d get more traction is they were called THERE and NOT THERE, but but it would be perilously easy to start confusing your WHERE and THERE clause.

Often besmirched by someone who read a blog post about MySQL in 1998 as “subqueries, which are slower than joins”, they’re two of the most useful things you can grasp and use regularly.

Though they are a bit like subqueries, the columns that you select in an EXISTS or NOT EXISTS subquery can’t be used in the outer query. You can put whatever you want in the select list, from * to 1 to COUNT to 1/0 to the entire contents of the King James Bible, and it will never end up making even the dimmest difference in the world. Likewise, adding DISTINCT, TOP, or any other row-limiting device will do absolutely nothing to change the query plan or performance.

Get over yourself.

Both EXISTS and NOT EXISTS already set a row goal of 1, because all either one has to do is determine if a single row is there or not, just with the logic reversed for each.

Your First Mistakes

Let’s say someone asks you to gin up a list of Users who have Posted anything at all, but whose Reputation still floats at the dreaded 1.

Your first instinct would likely be to write a query that looks like this.

SELECT 
    u.Id,
    u.DisplayName
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON u.Id = p.OwnerUserId
WHERE u.Reputation = 1
ORDER BY 
    u.Id;

But you’d quickly find yourself confronted by many duplicate row values, because that’s what one-to-many joins produce. Duplicates.

Your next move, tongue hanging out, sweating profusely, knuckles creaking, nearly paralyzed by the uncertainty of your continued human functioning, would be to do something like this:

SELECT DISTINCT 
    u.Id,
    u.DisplayName
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON u.Id = p.OwnerUserId
WHERE u.Reputation = 1
ORDER BY 
    u.Id;

And, sure, with a small enough data set, this is an okay choice. You passed the pop quiz, hot shot. Your instinct to use DISTINCT was not wholly incorrect, but there’s a better way.

But as you start dealing with larger and more imposing sets of data, DISTINCT will no longer cut it.

What EXISTS Does Different

While EXISTS will still use a join to match rows between tables, the semantics are quite a bit different. It can move on once it has determined that a row is either there or not there.

You don’t need to add DISTINCT, grouping, or anything else to get the results you wanted in the first place.

SELECT 
    u.Id,
    u.DisplayName
FROM  dbo.Users AS u
WHERE u.Reputation = 1
AND EXISTS 
( 
    SELECT 
        1/0 
    FROM dbo.Posts AS p 
    WHERE p.OwnerUserId = u.Id 
)
ORDER BY 
    u.Id;

Once EXISTS locates a match, it moves on to the next value from the outer side (in this case the Users table), and attempts to find a match. If no match is found, the row is discarded, which is common to inner joins.

Where a lot of developers get hung up at first is in assuming that EXISTS and NOT EXISTS work like IN or NOT in, and they miss the inner where clause to tell the database which rows should match.

I’ve seen a lot of EXISTS queries written, quite incorrectly, like this:

SELECT 
    u.Id,
    u.DisplayName
FROM  dbo.Users AS u
WHERE u.Reputation = 1
AND EXISTS 
( 
    SELECT 
       p.OwnerUserId 
    FROM dbo.Posts AS p
)
ORDER BY 
    u.Id;

Which will, of course, return absolutely everything. Don’t do this.

The column you select inside of the EXISTS subquery does not infer any sort of matching logic.

Like I said before, it’s essentially discarded by the optimizer.

Your Second Mistakes

No half-assed SQL tutorial is complete without showing you the wrong way to find non-matching rows between two tables.

It will undoubtedly look something like this:

SELECT 
   records = 
       COUNT_BIG(u.Id)
FROM dbo.Users AS u
LEFT JOIN dbo.Posts AS p
  ON u.Id = p.OwnerUserId
WHERE p.Id IS NULL;

It’s not that this pattern is never better, it’s just that it shouldn’t be your go-to for each and every query with this goal in mind.

You take two tables, you join them together, and you add a predicate to your where clause to find rows where an ordinarily not-NULL column returns NULLs.

The problem is that SQL Server’s query optimizer doesn’t contain any logic to turn this into the type of query plan that you’d get using NOT EXISTS instead.

You end up needing to fully join any tables involved together, and then later on use a filter to remove rows where no match was found. This can be incredibly inefficient, especially on large data sets.

One may even be dealing with “big data” when the follies of this paradigm become quite clear.

A generally better approach to writing this type of query is to tell the database you’re using exactly what you’re after and exactly what you expect:

SELECT 
    records = 
        COUNT_BIG(u.Id)
FROM dbo.Users AS u
WHERE NOT EXISTS 
( 
    SELECT 
        1/0 
    FROM dbo.Posts AS p 
    WHERE p.OwnerUserId = u.Id 
);

Your developer-life will be a whole lot less confusing and tiresome if you arm yourself with options and alternatives, which means you’ll have lots of mental energy left over to, like, learn 17 new frameworks and really impress your friends.

Think of the frameworks.

Gear Up

You should make good use of the EXISTS and NOT EXISTS patterns in your queries when you don’t require any rows from another table, and you only need to validate if something is there or not.

In cases where you need to get information from another table, joins are likely the most direct path to getting back the data you need.

But this all brings up an interesting question: what if you want to get back information in the select list without adding in join clauses, worrying about inner, outer, full, or cross, and wondering silently if one day things might go pear shaped.

We’ll talk about that in the next post, when we go over correlated subqueries.

Thanks for reading!

Going Further

Vote For My Idea: SSMS Right Click To Open Execution Plans In A New Tab

Posted on February 28, 2024February 18, 2024 by Erik Darling

SSMS Right Click To Open Execution Plans In A New Tab

Vote for my great idea here.

Sort of odd, I have no idea how this idea ended up in the category that it’s in. I was viewing suggestions for SQL Server/SSMS, but when I went to submit this one, it ended up in “Microsoft Entra”.

Maybe I missed something. Maybe someone from Microsoft will be nice enough to move it to the right place. Maybe not. Right now, it lives where it lives.

I use SSMS, because my primary job is SQL Server analysis, performance tuning, and general server fixin’. It’s far and away the most competent tool for the job (sort of like me).

Also sort of like me, there’s not a lot of stiff competition out there 😘

One problem I run into regularly is when I’m tuning a query, and I want to keep one of the resulting execution plans available, so I can:

Compare it after I make some other changes and run the query again
Do some additional analysis without worrying about accidentally losing the plan

The only way to do that is to save the plan, stick the XML in another tool, or keep opening new query tabs to run things in, where I won’t lose the plan.

I think it would be a reasonable and helpful extension of the current set of right-click menu options to be able to open a query plan in a new tab.

ssms right click — just one more won’t hurt

Vote for my great idea here.

Thanks for reading!

Going Further

The How To Write SQL Server Queries Correctly Cheat Sheet: Joins

Posted on February 27, 2024March 1, 2024 by Erik Darling

So Many Choices

SQL Server is full of ~~landmines~~ options when you’re writing queries. For most queries, you don’t need much beyond the basics.

Think of your standard CRUD operations. Most don’t even require a join; they’re very straightforward. And hey, if you completely denormalize all your data to one huge table, you’ll never have to think about a lot of this stuff anyway.

It’s only when developers are forced to think about things that things start to go wrong. I don’t mean to pick on developers specifically. It’s the human condition. Thinking often leads to poor choices.

In this post, I’m going to give you some basic guidance on when to use various T-SQL facilities, based on years of finding, fixing, and writing queries.

Some of the details and information may not surprise the more seasoned and spiced of you out there.

Here’s a piece of advice that I give everyone: Always start with a SELECT. I don’t care if the final form of your query is going to be an insert, update, or delete (I do care if it’s going to be a merge, because ew), you should always start off by writing a select, so you can validate query results first. It’s easy enough to change things over when you’re done, but please make sure what you’re changing is what you expect to change. I’d even go one step further and say that the first time you run your modification query, you should do it in a transaction with a ROLLBACK command.

I’ll usually do some variation on this, so I can see inserted and deleted results easily:

BEGIN TRANSACTION
    UPDATE TOP (100)
        u
    SET u.Reputation += 1000
    OUTPUT
        'D' AS d, Deleted.*,
        'I' AS i, Inserted.*
    FROM dbo.Users AS u
    WHERE u.Reputation < 1000
    AND   u.Reputation > 1;
ROLLBACK TRANSACTION;

Anyway, on to the cheat codes.

Inner Joins

Joins combine data horizontally (sideways, for the forgetful). The most basic thing you can do with two tables in a database, really.

The important thing to remember is that in one-to-many, and many-to-many relationships, joins will display duplicate matched values.

If you don’t need to show data from another table, don’t use a join. We’ll talk about other options later, but please let this burn into your mind. The number of queries I’ve seen with needless DISTINCT instructions on them is nearing a decent pre-tax cash bonus.

Here’s an example of when a join is necessary. We want to get all of our Users with a Reputation over 500,000, and sum up the Score on all their Posts, plus figure out what kind of Post the points were awarded to.

SELECT
    u.Id,
    u.DisplayName,
    PostType =
        CASE
             p.PostTypeId
             WHEN 1
             THEN 'Question'
             WHEN 2
             THEN 'Answer'
             ELSE 'Other'
        END,
    TotalScore = SUM(p.Score)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON p.OwnerUserId = u.Id
WHERE u.Reputation > 500000
GROUP BY
    u.Id,
    u.DisplayName,
    p.PostTypeId
ORDER BY
    TotalScore DESC;

Because we need multiple columns from the Posts table, we can’t just use a correlated subquery in the select list. Those only allow for one column or expression to be projected from the results.

Since this is an inner join, it restricts the results down only to matching rows. Now, it’s not really possible to get a Reputation over 1 without posting things that other users can vote on, so it doesn’t make sense to use an outer join here.

What if we wanted to find slightly different data?

(Left) Outer Joins

Let’s say we wanted to generate a report of people whose Reputation is sitting at one (the site minimum), to figure out if they’re inactive, unpopular, or if their account has been suspended for some reason.

We could use a query like this to do it.

SELECT
    u.Id,
    u.DisplayName,
    u.Reputation,
    TotalScore = SUM(p.Score),
    c = COUNT_BIG(p.Id)
FROM dbo.Users AS u
LEFT JOIN dbo.Posts AS p
  ON p.OwnerUserId = u.Id
WHERE u.Reputation = 1
GROUP BY 
    u.Id,
    u.DisplayName,
    u.Reputation
ORDER BY
    TotalScore;

Before talking about the logic, it’s important to note that when you’re counting rows from the outer side of a join, you’ll usually wanna specify a non-nullable column to pass into the counting function, rather than (*), so you don’t incorrectly count NULL values.

Primary key columns are your friend for this, but any non-NULLable column will do.

We need a left join here, because we want everyone with a Reputation of 1, not just those users who have posted. The left join preserves rows from the Users table in that case.

The results we get back find all sorts of interesting things (that I told you we were looking for):

Users who were very active, but then had their accounts suspended
Users who have posted, but were heavily downvoted
Users who haven’t posted at all

sql server query results — bad, ugly, lazy

I’m not going to talk about right outer joins, because that’s the foolish domain of characterless buffoons who use Venn diagrams to explain join results.

I assume they have good intentions, they just lack the backbone to tell you that there is no natural reason to ever use a right join, that isn’t better logically expressed in a different way.

They’re usually trying to sell you something.

(Full) Outer Joins

In short, these preserve results from both tables, but still with a correlation. I’d nearly put these in the same category as right joins, except they have a couple decent use cases, and aren’t personally offensive to polite society.

Let’s say we want to figure out how many Posts don’t have an associated User, and how many Users don’t have an associated Post all in one query:

SELECT
    PostsWithoutAUser = 
        SUM(CASE WHEN u.Id IS NULL THEN 1 ELSE 0 END),
    UsersWithoutAPost = 
        SUM(CASE WHEN p.Id IS NULL THEN 1 ELSE 0 END)
FROM dbo.Users AS u
FULL JOIN dbo.Posts AS p
  ON p.OwnerUserId = u.Id;

It’s sort of an exception report, to let you know just how much referential integrity your data lacks.

Aside from oddball situations, you shouldn’t have to think much about these in your day to day life.

Cross Joins

Like full joins, I don’t see cross joins used terribly often, though they do have some uses, like populating a grid.

A reasonably worded example would be something like: you have a table of scotch, and a table of glass sizes, and you want to show someone all possible combinations of scotch and glass sizes.

If you pick a big enough glass, eventually using cross joins in more creative ways will seem like a good idea. One place I’ve been forced to use them is in some of my stored procedures, like sp_PressureDetctor.

Here’s one example:

SELECT
    sample_time =
        CONVERT
        (
            datetime,
            DATEADD
            (
                SECOND,
                (t.timestamp - osi.ms_ticks) / 1000,
                SYSDATETIME()
            )
        ),
    sqlserver_cpu_utilization =
        t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]','int'),
    other_process_cpu_utilization =
        (100 - t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]','int')
         - t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]','int')),
    total_cpu_utilization =
        (100 - t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int'))
FROM sys.dm_os_sys_info AS osi
CROSS JOIN
(
    SELECT
        dorb.timestamp,
        record =
            CONVERT(xml, dorb.record)
    FROM sys.dm_os_ring_buffers AS dorb
    WHERE dorb.ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR'
) AS t
WHERE t.record.exist('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization[.>= sql:variable("@cpu_utilization_threshold")])') = 1
ORDER BY
    sample_time DESC;

The sys.dm_os_sys_info view is a single row, with no relation at all to sys.dm_os_ring_buffers, but I need to use the one value in the one column in the one row for every row that it produces, so that I can turn the timetable column into a human-understandable value.

Here’s another example from the same procedure, slightly abridged:

SELECT
    total_threads =
        MAX(osi.max_workers_count),
    used_threads =
        SUM(dos.active_workers_count),
    available_threads =
        MAX(osi.max_workers_count) - SUM(dos.active_workers_count),
    threads_waiting_for_cpu =
        SUM(dos.runnable_tasks_count),
    requests_waiting_for_threads =
        SUM(dos.work_queue_count),
    current_workers =
        SUM(dos.current_workers_count),
    total_active_request_count =
        SUM(wg.active_request_count),
    total_queued_request_count =
        SUM(wg.queued_request_count),
    total_blocked_task_count =
        SUM(wg.blocked_task_count),
    total_active_parallel_thread_count =
        SUM(wg.active_parallel_thread_count),
    avg_runnable_tasks_count =
        AVG(dos.runnable_tasks_count)
FROM sys.dm_os_schedulers AS dos
CROSS JOIN sys.dm_os_sys_info AS osi
CROSS JOIN
(
    SELECT
        wg.active_request_count,
        wg.queued_request_count,
        wg.blocked_task_count,
        wg.active_parallel_thread_count
    FROM sys.dm_resource_governor_workload_groups AS wg      
) AS wg;

In this case, I keep myself safe from exploding result sets by aggregating all of the selected columns. You may also find that necessary, should you choose to work with data so terrible that it requires cross joins.

One thing to be especially aware of is that cross joins can only be physically implemented in SQL Server with a nested loops join, so the larger your tables get, the worse performance will get.

Beware out there.

Thanks for reading!

Going Further

Join me at DataTune in Nashville, March 8-9 2024

Posted on February 26, 2024January 16, 2024 by Erik Darling

Spring Training

This March, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of great content.

Get your tickets here for this event, taking place Friday, March 8th-9th 2024 at Belmont University – Massey Center 1900 Belmont Blvd, Nashville, TN 37212

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

SQL Server Index Design: Getting Key Column Order Right

Going Further

How To Evaluate Index Effectiveness While Tuning SQL Server Queries

Going Further

How To Tell If Lazy Table Spools Are Slowing Your SQL Server Queries Down, And Fix Them

Going Further

Spring Training

The Foundations Of SQL Server Performance Tuning

Event Details:

Going Further

Why Multiple Seeks Are Slower For Some SQL Server Queries

Going Further

Selectively Numb

Your First Mistakes

What EXISTS Does Different

Your Second Mistakes

Gear Up

Going Further

SSMS Right Click To Open Execution Plans In A New Tab

Going Further

So Many Choices

Inner Joins

(Left) Outer Joins

(Full) Outer Joins

Cross Joins

Going Further

Spring Training

The Foundations Of SQL Server Performance Tuning

Event Details:

Going Further

What SQL Server’s Query Optimizer Doesn’t Know About Numbers

Going Further