Execution Plans – Page 8

Starting SQL: Analyzing SQL Server Query Plans

Posted on August 2, 2021May 16, 2022 by Erik Darling

Starting SQL: Analyzing Query Plans

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: What’s In A SQL Server Query Plan

Posted on July 30, 2021May 16, 2022 by Erik Darling

Starting SQL: What’s In A Query Plan

Going Further

Starting SQL: SARGability, Or Why Some SQL Server Queries Will Never Seek

Posted on July 9, 2021May 16, 2022 by Erik Darling

Starting SQL: SARGability

Going Further

Starting SQL: Analyzing SQL Server Query Plans

Posted on July 8, 2021May 16, 2022 by Erik Darling

Starting SQL: Query Plans

Going Further

Starting SQL: Query Execution In SQL Server

Posted on July 6, 2021May 16, 2022 by Erik Darling

Starting SQL: Query Execution

Going Further

Common Table Expressions Are Useful For Rewriting Scalar Functions In SQL Server

Posted on June 23, 2021May 16, 2022 by Erik Darling

Punching

Thanks for watching!

Going Further

Why Comparing Columns Doesn’t Always Seek In SQL Server

Posted on June 22, 2021May 16, 2022 by Erik Darling

Adjoining

When writing queries, sometimes you have the pleasure of being able to pass a literal value, parameter, or scalar expression as a predicate.

With a suitable index in place, any one of them can seek appropriately to the row(s) you care about.

But what about when you need to compare the contents of one column to another?

It gets a little bit more complicated.

All About Algorithms

Take this query to start with, joining Users to Posts.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id;

The OwnerUserId column doesn’t have an index on it, but the Id column on Users it the primary key and the clustered index.

But the type of join that’s chosen is Hash, and since there’s no where clause, there’s no predicate to apply to either table for filtering.

This is complicated slightly by the Bitmap, which is created on the OwnerUserId column from the Posts table and applied to the Id column from the Users table as an early filter.

However, it’s a useless Bitmap, and Bitmaps don’t really seek anyway.

The same pattern can generally be observed with Merge Joins. Where things are a bit different is with Nested Loops.

Shoop Da Loop

If we use a query hint, we can see what would happen with a Nested Loops Join.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id
OPTION(LOOP JOIN);

The plan looks like this now, with a Seek on the Users table.

The reason is that this flavor of Nested Loops, known as Apply Nested Loops, takes each row from the outer input and uses it as a scalar operator on the inner input.

An example of Regular Joe Nested Loops™ looks like this:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.Reputation = 
(
    SELECT
        MIN(p.Score)
    FROM dbo.Posts AS p
);

Where the predicate is applied at the Nested Loops operator:

Like most things, indexing is key, but there are limits.

Innermost

Let’s create this index:

CREATE INDEX ud ON dbo.Users(UpVotes, DownVotes);

And run this query:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.UpVotes = u.DownVotes;

The resulting query plan looks like this:

But what other choice is there? If we want a seek, we need a particular thing or things to seek to.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.UpVotes = u.DownVotes
AND   u.UpVotes = 1;

We seek to everyone with an UpVote of 1, and then somewhat awkwardly search the DownVotes column for values >= 1 and <= 1.

But again, these are specific values we can search for.

Thanks for reading!

Going Further

Common Query Plan Patterns: Introduction

Posted on March 30, 2021May 16, 2022 by Erik Darling

Muy Bueno

I’m writing this series because as interesting as single operators can be, you rarely run into interesting query plans that are a single operator. I don’t know exactly how many posts this will end up being. I have a list of about 15 things that I’d like to write about.

Anyway, I’d been thinking about something like this for a while, because being able to understand which part of a query generates which part of a query plan can help you focus in on what you need to work on. It’s also helpful to understand how different aspects of your database design and written queries might manifest in query plans.

You know, for performance.

Anyway, I hope you enjoy it. This post will be remarkably empty on first read, but will populate over time.

Thanks for reading!

Going Further

Table Partitioning And Parallelism In SQL Server Standard Edition

Posted on March 22, 2021May 16, 2022 by Erik Darling

Jammed, Not Jellied

UPDATE 2021-04-14: Microsoft has updated the documentation for all 2016+ versions of SQL Server to indicate that parallelism is available for partitioned tables in non-Enterprise versions.

There’s been a note in the documentation since SQL Server 2016 SP1 brought certain Programmability Features™ to Standard Edition with regard to parallelism differences between Standard and Enterprise Edition.

For the sake of completeness, I did all my testing across both Standard and Developer Editions of SQL Server and couldn’t detect a meaningful difference.

Additionally, documentation about Parallel Query Execution Strategy for Partitioned Objects doesn’t note any differences in strategy between the two Editions.

There may be scenarios outside of the ones I tested that do show a difference, but, uh. I didn’t test those.

Obviously.

Every table is going to test this query at different DOPs.

SELECT
    DATEPART(YEAR, vp.CreationDate) AS VoteYear,
    DATEPART(MONTH, vp.CreationDate) AS VoteMonth,
    COUNT_BIG(DISTINCT vp.PostId) AS UniquePostVotes,
    SUM(vp.BountyAmount) AS TotalBounties
FROM dbo.Votes_p AS vp
GROUP BY
    DATEPART(YEAR, vp.CreationDate),
    DATEPART(MONTH, vp.CreationDate);

Two Partitions

Here’s the setup:

CREATE PARTITION FUNCTION VoteYear2013_pf(DATETIME)
    AS RANGE RIGHT FOR VALUES
    (
        '20130101'
    );
GO


CREATE PARTITION SCHEME VoteYear2013_ps
    AS PARTITION VoteYear2013_pf
    ALL TO ([PRIMARY]);

DROP TABLE IF EXISTS dbo.Votes2013_p;

CREATE TABLE dbo.Votes2013_p
(
    Id int NOT NULL,
    PostId int NOT NULL,
    UserId int NULL,
    BountyAmount int NULL,
    VoteTypeId int NOT NULL,
    CreationDate datetime NOT NULL,
    CONSTRAINT PK_Votes2013_p_Id
        PRIMARY KEY CLUSTERED (CreationDate, Id)
) ON VoteYear2013_ps(CreationDate);


INSERT dbo.Votes2013_p WITH(TABLOCK)
    (Id, PostId, UserId, BountyAmount, VoteTypeId, CreationDate)
SELECT v.Id,
       v.PostId,
       v.UserId,
       v.BountyAmount,
       v.VoteTypeId,
       v.CreationDate
FROM dbo.Votes AS v;

The data split looks like this:

SQL Server Query Results — not a good use of partitioning

Running our test query at DOP 4, there are slight differences in counts across threads, but slight timing differences can explain that.

Standard Edition is on top, Developer Edition is at the bottom. There is a ~200ms difference here, but averaged out over multiple runs things end up pretty dead even.

Even looking at the row counts per thread, the distribution is close across both versions. I think it’s decently clear that the four threads work cooperatively across both partitions. A similar pattern continues at higher DOPs, too. I tested 8 and 16, and while there were slight differences in row counts per thread, there was a similar distribution pattern as at DOP 4.

Eight Partitions

Using a different partitioning function:

CREATE PARTITION FUNCTION VoteYear_pf(DATETIME)
    AS RANGE RIGHT FOR VALUES
    (
        '20080101',
        '20090101',
        '20100101',
        '20110101',
        '20120101',
        '20130101',
        '20140101'
    );
GO

We’re going to jump right to testing the query at DOP 8.

Again, different threads end up getting assigned the work, but row counts match exactly across threads that did get work, and those numbers line up exactly to the number of rows in each partition.

In both queries, two threads scanned a partition with no rows and did no work. Each thread that did scan a partition scanned only one partition.

At DOP 16, the skew gets a bit worse, because now four threads do no work.

The remaining threads all seem to split the populated partitions evenly, though again there are slight timing differences that result in different row counts per thread, but it’s pretty clear that there is cooperation here.

At DOP 4, things get a bit more interesting.

In both queries, two threads scan exactly one partition.

The rows with arrows pointing at them represent numbers that exactly match the number of rows in a single partition.

The remaining threads have exactly the same row counts across versions.

Fifteen Partitions

The results here show mostly the same pattern as before, so I’m keeping it short.

CREATE PARTITION FUNCTION VoteYear16_pf(DATETIME)
    AS RANGE RIGHT FOR VALUES
    (
        '20080101',
        '20080601',
        '20090101',
        '20090601',	    
        '20100101',
        '20100601',
        '20110101',
        '20110601',
        '20120101',
        '20120601',
        '20130101',
        '20130601',
        '20140101',
        '20140601'
    );
GO

At DOP 4 and 8, threads work cooperatively across partitions. Where things get interesting (sort of) is at DOP 16.

The four empty partitions here result in 4 threads doing no work in Developer/Enterprise Edition, and 5 threads doing no work in Standard Edition.

At first, I thought this might be a crack in the case, so I did things a little bit differently. In a dozen or so runs, the 5 empty threads only seemed to occur in the Standard Edition query. Sometimes it did, sometimes it didn’t. But it was at least something.

Fifteen Partitions, Mostly Empty

I used the same setup as above, but this time I didn’t fully load data from Votes in:

INSERT dbo.Votes16e_p WITH(TABLOCK)
    (Id, PostId, UserId, BountyAmount, VoteTypeId, CreationDate)
SELECT v.Id,
       v.PostId,
       v.UserId,
       v.BountyAmount,
       v.VoteTypeId,
       v.CreationDate
FROM dbo.Votes AS v
WHERE v.CreationDate >= '20130101';

And… Scene!

That’s Just Great

Aside from one case where an extra thread got zero rows in Standard Edition, the behavior across the board looks the same.

Most of the behavior is sensible, but cases where multiple threads get no rows and don’t move on to other partitions is a little troubling.

Not that anyone has partitioning set up right anyway.

Thanks for reading!

Going Further

A Parameterization Performance Puzzle With TOP PERCENT in SQL Server

Posted on March 19, 2021May 16, 2022 by Erik Darling

Lawdy

There was a three-part series of posts where I talked about a weird performance issue you can hit with parameterized top. While doing some query tuning for a client recently, I ran across a funny scenario where they were using TOP PERCENT to control the number of rows coming back from queries.

With a parameter.

So uh. Let’s talk about that.

Setup Time

Let’s start with a great index. Possibly the greatest index ever created.

CREATE INDEX whatever 
ON dbo.Votes
    (VoteTypeId, CreationDate DESC)
WITH
(
    MAXDOP = 8,
    SORT_IN_TEMPDB = ON
);
GO

Now let me show you this stored procedure. Hold on tight!

CREATE OR ALTER PROCEDURE dbo.top_percent_sniffer 
(
    @top bigint, 
    @vtid int
)
AS
SET NOCOUNT, XACT_ABORT ON;
BEGIN
    SELECT TOP (@top) PERCENT
        v.*
    FROM dbo.Votes AS v
    WHERE v.VoteTypeId = @vtid
    ORDER BY v.CreationDate DESC;

END;

Cool. Great.

Spool Hardy

When we execute the query, the plan is stupid.

EXEC dbo.top_percent_sniffer
    @top = 1,
    @vtid = 6;
GO

We don’t use our excellent index, and the optimizer uses an eager table spool to hold rows and pass the count to the TOP operator until we hit the correct percentage.

This is the least ideal situation we could possibly imagine.

Boot and Rally

A while back I posted some strange looking code on Twitter, and this is what it ended up being used for (among other things).

getting weird. pic.twitter.com/lRVQQr81Yi

— Erik Darling Data (@erikdarlingdata) February 27, 2021

The final version of the query looks like this:

CREATE OR ALTER PROCEDURE dbo.top_percent_sniffer 
(
    @top bigint, 
    @vtid int
)
AS
SET NOCOUNT, XACT_ABORT ON;
BEGIN;
    
    WITH pct AS
    (
        SELECT
            records = 
                CONVERT(bigint, 
                    CEILING(((@top * COUNT_BIG(*)) / 100.)))
        FROM dbo.Votes AS v
        WHERE v.VoteTypeId = @vtid
    )
    SELECT
        v.*
    FROM pct
    CROSS APPLY
    (
        SELECT TOP (pct.records)
            v.*
        FROM dbo.Votes AS v
        WHERE v.VoteTypeId = @vtid
        ORDER BY v.CreationDate DESC
    ) AS v;

END;
GO

Soul Bowl

This definitely has drawbacks, since the expression in the TOP always gives a 100 row estimate. For large numbers of rows, this plan could be a bad choice and we might need to do some additional tuning to get rid of that lookup.

There might also be occasions when using a column store index to generate the count would be benefit, and the nice thing here is that since we’re accessing the table in two different ways, we could use two different indexes.

But for reliably small numbers of rows, this is a pretty good solution.

Thanks for reading!