Performance Measures And Factors

Start To Finish


When I’m working with clients, we look at a lot of stuff together to make sure things are in good working order.

If we’re tuning a single query, the goal is pretty easy to define. We want it to finish as fast as possible.

How you get there is where things may get interesting, but let’s be honest: most of the time you’re going to be fixing the same half a dozen problems in some combination.

Note that the query finishing, and results being finished returning are two different metrics. A query may finish very quickly, but returning results may take considerably longer for various reasons.

You can think of this as responsiveness, and it’s a cornerstone to building something people are happy with. When you’re talking about user experience, performance has to be part of the conversation.

How fast individual queries finish also plays a big part in overall concurrency.

Put Through


Typically this matters more for OLTP workloads. Data warehouses aren’t usually high concurrency environments in that the Batch Requests/Sec counter is sitting around bored. Joe Obbish has a great talk on improving data warehouse loading patterns, because usually isn’t always. Ha ha ha. Barf.

Workloads that do have OLTP characteristics are more sensitive to all sorts of things, and I don’t just mean parameters. Locking and Deadlocking are high up on the list, along with available hardware resources.

Rather obviously, if you want to improve overall throughput, having queries finish as quickly as possible is an attractive goal. You can run way more queries in one second if they take 100ms than if they take 900ms, you know? You know. You’re smart.

Factorials


What are the factors that influence query speed?

Starting in the database:

  • Well written queries
  • Thoughtful indexes
  • Maintained Statistics
  • Correctly stored data types
  • Properly normalized tables

Moving out a little bit further, there are some important settings:

  • Parallelism
  • Optimistic Isolation Levels
  • Max Server Memory
  • tempdb stuff
  • Instant File Initialization
  • Lock Pages In Memory (mostly)

Further out, we have the hardware:

  • CPU
  • Memory
  • Disk
  • Network Path To Disks (SAN)

A lot of times something being off in one place can make it look like there’s a bottleneck in another place. Probably the easiest example to conjure up is if you’re missing an opportune nonclustered index, and you end up scanning a clustered index over and over again. The clustered index doesn’t fit in memory, so you end up generating a lot of PAGEIOLATCH_SH waits.

It might look like you have a memory shortage, but in reality you could be making far better use of the memory you have.

Margin For Error


Being able to identify and fix specific bottlenecks is an important skill. Wait stats and other counters can be useful, but are often underwhelming unless you’re watching at the right time, or you have a monitoring tool that logs them for you. Bottlenecks can live in many places, and oftentimes performance tuning is like playing whack-a-mole with them.

Hitting hard limits and running out of CPU or RAM is never fun. Picking hardware that’s up to the task is step one, but staying on top of query and index tuning to make sure it stays adequate for longer is a process that you need to go through. If this is the sort of thing you need help with, drop me a line.

Thanks for reading!

Tracking Row Changes With Temporal Columns

Extra, Extra


Temporal tables are cool, but keeping all that history can be stressful. Change Tracking adds overhead to every transaction, and requires Snapshot Isolation to be successful. Change Data Capture can also run into problems scanning the transaction log, depending on transaction volume, etc.

Change Data Capture is also a little unfortunate in that it doesn’t track schema changes like adding or dropping columns, or changing column data types. Change Tracking doesn’t either, it just tracks the keys of what changed when data is modified.

Temporal Tables are different, because it will add new, and drop removed columns from the history table as they change in the base table. It’ll also push data type changes across. But the history table will not have the various permutations tracked. It just mirrors current state.

Just The Columns, Ma’am


Let’s say your needs aren’t robust enough to need any one of those highly specialized features, or even triggers to move data around when it changes.

You’re perfectly free and able to add the tracking columns that temporal tables use to your base tables, but adding them is far from free. When I added them to the 17 million row Posts table, it took about 40 seconds. My laptop doesn’t suck, either.

To show you a little how it works, let’s create a copy of the Votes table from Stack Overflow.

CREATE TABLE dbo.Votes_Tracked
(
    Id int NOT NULL,
    PostId int NOT NULL,
    UserId int NULL,
    BountyAmount int NULL,
    VoteTypeId int NOT NULL,
    CreationDate datetime NOT NULL,
    LastModified datetime2 GENERATED ALWAYS AS ROW start NOT NULL
        CONSTRAINT DF_LastModified DEFAULT (SYSDATETIME()),
    JunkDate datetime2 GENERATED ALWAYS AS ROW END HIDDEN NOT NULL
        CONSTRAINT DF_JunkDate DEFAULT ('9999-12-31 23:59:59.9999999'),
    PERIOD FOR SYSTEM_TIME (LastModified, JunkDate),
    CONSTRAINT PK_Votes_Id
        PRIMARY KEY CLUSTERED (Id ASC)
);
GO

Note that you need two columns to define the “period for system time”, and one of them will always be useless. That’s why I called it JunkDate, and not, like, whatever. But the good news is you can define that column as HIDDEN so that it doesn’t show up in all your queries.

Now we can stick some data in there and see how it works.

INSERT 
    dbo.Votes_Tracked WITH (TABLOCK)
(
    Id,
    PostId,
    UserId,
    BountyAmount,
    VoteTypeId,
    CreationDate
)
SELECT
    Id,
    PostId,
    UserId,
    BountyAmount,
    VoteTypeId,
    CreationDate
FROM StackOverflow2013.dbo.Votes AS v
WHERE v.CreationDate >= '20130101';

Looking Around


The table data looks like this:

woah man

If we run an update:

UPDATE v
    SET v.BountyAmount = 9999
FROM dbo.Votes_Tracked AS v
WHERE v.VoteTypeId = 7;
temporally yours

Note that these screen caps were taken without the HIDDEN keyword added to the table definition: that was an after thought recommended by my friend Peter.

Whatabouts?


Of course, if you remove rows from the table, they’re just gone. You’d still need a trigger to cover deletes, if you need to track those.

And if you want to remove those columns later, it takes a little bit of tweaking.

ALTER TABLE dbo.Votes_Tracked DROP CONSTRAINT DF_LastModified, DF_JunkDate;
ALTER TABLE dbo.Votes_Tracked DROP COLUMN JunkDate, LastModified;

Msg 13588, Level 16, State 1, Line 63
Column 'JunkDate' in table 'Crap.dbo.Votes_Tracked' cannot be dropped because it is a part of period definition.

Of course, the table isn’t system versioned, so this command will also fail:

ALTER TABLE dbo.Votes_Tracked SET (SYSTEM_VERSIONING = OFF);

Msg 13591, Level 16, State 1, Line 66
SYSTEM_VERSIONING is not turned ON for table 'Crap.dbo.Votes_Tracked'.

If you want to remove them, you’ll need to use this:

ALTER TABLE dbo.Votes_Tracked DROP PERIOD FOR SYSTEM_TIME;

Now you’ll be able to remove them.

Is This A Good Idea?


Well, it depends on what you need. If you just need to know when a row changed, and you don’t need to know what changed or who changed it, it can work in a more automated way than triggers. It does require an additional column, which isn’t ideal, but it’s not a gigantic column, and you don’t need to worry about indexing it because the data is junk.

I haven’t found any ridiculous downsides to this, but I’ll keep this updated if I do.

Thanks for reading!

Considerations For Implementing Soft Deletes

From The Beginning


Implementing soft deletes for an app that’s been around for a while can be tough. In the same way as implementing Partitioning can be tough to add in later to get data management value from (rebuilding clustered indexes on the scheme, making sure all nonclustered indexes are aligned, and all future indexes are too, and making sure you have sufficient partitions at the beginning and end for data movement).

Boy, I really stressed those parentheses out.

If you do either one from the outset, it’s far less painful to manage. The structural stuff is there for you from the beginning, and you can test different strategies early on before data change become difficult to manage.

Queries


The first and most obvious thing is that all your queries now need to only find data that isn’t deleted.

Almost universally, it’s easier to put views on top of tables that have the appropriate bit search for deleted or not deleted rows than to expect people to remember it.

CREATE VIEW dbo.Users_Active
AS
SELECT 
    u.*
FROM dbo.Users AS u
WHERE u.is_deleted = 0;

CREATE VIEW dbo.Users_Inactive
AS
SELECT 
    u.*
FROM dbo.Users AS u
WHERE u.is_deleted = 1;

It’s not that views have any magical performance properties; they’re just queries after all, but it gives you an explicit data source.

Indexes


Depending on how your other queries search for data, you may need to start accounting for the is_deleted flag in your indexes. This could make a really big difference if the optimizer stops choosing your narrower nonclustered indexes because it hates key lookups.

Typically, other predicates will give you a selective-enough result set that a residual predicate on a bit field won’t make much difference. If you’ve already got a seek to the portion of data you’re interested in and most of it will be not-deleted, who cares?

And let’s be honest, in most implementations deleted rows will be the minority of data, and searches for it will be far less common. Usually it’s just there for an occasional audit.

In adjacent cases where instead of deleted you need to designate things as currently active, and you may have many inactive rows compared to active rows, filtered indexes can be your best friend.

Coming back to the views, I don’t think that making them indexed is necessary by default, but it might be if you’re using forced parameterization and filtered indexes.

CREATE TABLE dbo.Users(id int, is_deleted bit);
GO 

CREATE INDEX u ON dbo.Users (id) WHERE is_deleted = 0;
GO

SELECT 
    u.id, u.is_deleted
FROM dbo.Users AS u
WHERE u.is_deleted = 0;

Under simple parameterization, this can be fine. Under forced parameterization, things can get weird.

tutor the tutors

Tables and Tables


In some cases, it might be easier to create tables specifically for deleted rows so you don’t have unnecessary data in your main tables. You can implement this easily enough with after triggers. Just make sure they’re designed to handle multiple rows.

If you want something out of the box, you might mess with:

  • Temporal tables
  • Change Data Capture
  • Change Tracking

However, none of those help you deal with who deleted rows. For that, you’ll need an Audit.

Thanks for reading!

Deduplicating Missing Index Requests Part 3

Dodo


We’ve got a set of missing index requests for a single table, and we’ve got the queries asking for them.

Going back to our queries and our index requests, all the queries have two things in common:

  • They filter on OwnerUserId
  • They order by Score

There are of course other elements in the where clause to attend to, but our job is to come up with one index that helps all of our queries.

Query Real Hard


To recap, these are our queries.

SELECT TOP (10)
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
ORDER BY p.Score DESC;
GO 10

SELECT TOP (10) 
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.CreationDate >= '20130101'
ORDER BY p.Score DESC;

SELECT TOP (10)
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.PostTypeId = 1
ORDER BY p.Score DESC;

SELECT TOP (10)
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.LastActivityDate >= '20130101'
ORDER BY p.Score DESC;

SELECT TOP (10) 
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.Score > 0
ORDER BY p.Score DESC;

Index Real Dumb


Which means that all of our missing index requests are going to be on maybe a couple key columns, and then include every other column in the Posts table.

This is a bad idea, so we’re going to dismiss the includes and focus on keys.

CREATE INDEX [OwnerUserId_LastActivityDate_Includes] 
    ON [StackOverflow2013].[dbo].[Posts] ([OwnerUserId], [LastActivityDate]);

CREATE INDEX [OwnerUserId_Score_Includes] 
    ON [StackOverflow2013].[dbo].[Posts] ([OwnerUserId], [Score]);

CREATE INDEX [OwnerUserId_PostTypeId_Includes] 
    ON [StackOverflow2013].[dbo].[Posts] ([OwnerUserId], [PostTypeId]);

CREATE INDEX [OwnerUserId_CreationDate_Includes] 
    ON [StackOverflow2013].[dbo].[Posts] ([OwnerUserId], [CreationDate]);

CREATE INDEX [OwnerUserId_Includes] 
    ON [StackOverflow2013].[dbo].[Posts] ([OwnerUserId]);

Now that we’ve got a more sane bunch of requests to focus on, let’s do something thinking.

I hate thinking, so we won’t do a lot of it.

Indexes put data in order, and equality predicates preserve ordering of secondary index columns. That makes putting the key on (OwnerUserId, Score) a no-brainer. One could make an entire career out of avoiding sorting in the database.

But now we have three other columns to think about: LastActivityDate, PostTypeId, and CreationDate.

We could spend a whole lot of time trying to figure out the best order here, considering things like: equality predicates vs inequality predicates, and selectivity, etc.

But what good would it do?

Dirty Secret


No matter what order we might put index key columns in after Score, it won’t matter. Most of our queries don’t search on OwnerUserId and then Score. Only one of them does, and it doesn’t search on anything else.

That means that most of the time, we’d be seeking to OwnerUserId, and then performing residual predicates against other columns we’re searching on.

On top of that, we’d have whatever overhead there is of keeping things in order when we modify data in the key of the index. Not that included columns are free-of-charge to modify, but you get my point. There’s no order preserved in them.

In reality, a good-enough-index for the good-enough-optimizer to come up with a good-enough-plan looks like this:

CREATE INDEX good_enough
    ON dbo.Posts
        (OwnerUserId, Score)
    INCLUDE 
        (PostTypeId, CreationDate, LastActivityDate);

Planama


The index above does two things:

  • It helps us search on a selective predicate on OwnerUserId
  • It keeps Score in order after the quality so the order by is free
  • It has all the other potential filtering elements so we can apply predicates locally
  • It teaches us that include column order doesn’t matter

All of the query plans will look roughly like this, regardless of the where clause:

you can do it

What Difference Does It Make?


Alright, so we’ve got one good-enough index for a bunch of different queries. By adding the index, we got all of them to go from taking ~600ms to taking 0ms.

What else did we do?

  • We made them faster without going parallel
  • They no longer need memory to sort data

And we did it without creating a gigantic covering index.

Of course, the optimizer still thinks we need indexes…

of what?

But do we really need them?

No.

77% of nothing is nothing.

Thanks for reading!

Deduplicating Missing Index Requests Part 2

Deedoo


In yesterday’s post, we talked a little about different ways to approach looking at missing index requests, and how their benefit is calculated in sp_BlitzIndex.

If you’re on SQL Server 2019, you may be able to get some idea which queries are generating missing index requests. It’s not documented yet 🤔, but that’s never stopped anyone from using anything in production.

Let’s look at the queries asking for them.

Natural Ruckus


Let’s take a look at what queries are causing those missing index requests.

Since I’m running them, I don’t have to do any work.

SELECT TOP (10)
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
ORDER BY p.Score DESC;

SELECT TOP (10) 
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.CreationDate >= '20130101'
ORDER BY p.Score DESC;

SELECT TOP (10)
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.PostTypeId = 1
ORDER BY p.Score DESC;

SELECT TOP (10)
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.LastActivityDate >= '20130101'
ORDER BY p.Score DESC;

SELECT TOP (10) 
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.Score > 0
ORDER BY p.Score DESC;

Without any helpful indexes, all of these take about the same amount of time — between 600 and 700ms, and most of the time is spent scanning the clustered index on Posts, which is around 17 million rows, just to return ~27k rows..

That’s Not “Slow”


Usually when people are complaining about slow queries, they’re talking about stuff that drags on for several seconds or longer.

At just about half a second, most people wouldn’t getting a running start to jump through hoops to make these faster.

But they all have a “high cost” of a touch over 3000 query bucks. If you’re the type of person who only focuses on one metric, you might be overlooking a whole lot of things that need attention.

spuriously yours

If we made this query 99% faster, it would mean a rather miniscule improvement in elapsed time. The flip side of the one-metric thing is using duration alone as a metric. Queries might have a very long duration because they’re blocked, but use minimal resource when they’re finally allowed to run by the gods of ACID compliance.

My favorite queries to find and tune are ones that:

  • Unnecessarily run for a long time, using a lot of CPU
  • Unnecessarily ask for large memory grants
  • Have issues with parameter sniffing

Let’s pretend there might be some value to tuning these, though. Maybe we’re upset that they’re going parallel. Maybe we have something against scanning clustered indexes. Maybe we just don’t have anything else to do.

In tomorrow’s post, we’ll look at how to take all of those requests and come up with one good enough-index for our queries.

Just like how the optimizer comes up with one “good enough” plan for all queries.

Thanks for reading!

Deduplicating Missing Index Requests Part 1

Church


I often find myself reviewing missing index requests during consulting engagements. Not because they’re so awesome, but because they’re often just good enough to provide some quick relief before more fine-tuned efforts are explored.

More to the point: if someone has no idea which queries they need to tune, and everything is pretty slow, this is a good starting place.

Given sufficient server uptime, of course.

World Tour


Sometimes you’ll see that slam-dunk missing index request with lots of uses, and you can tie it to a query that you know is bad. Of course, I’m quite partial to using sp_BlitzIndex to analyze indexes. There are a few different places that missing indexes will be detailed in.

  • Mode 0: the most important stuff
  • Mode 4: anything and everything
  • Mode 3: just missing index requests
  • Table Mode: analyzing just one table

The easiest way to find examples like I’ll be talking about is to look at just one table. In this case, the Posts table.

EXEC sp_BlitzIndex @TableName = 'Posts';

If you have missing index requests for a table, they’ll look something like this:

where you ack

That estimated benefit number is pretty big here, so it jumps out a bit. Normally I don’t start really paying attention until that number is >5 million. That’s not terribly scientific, but you have to draw the line somewhere.

Of course, one very sneaky thing to consider is when you have a set of duplicative requests with low-ish estimated benefit individually, but combined they just might add up to something quite useful.

Just A Kid


The estimated benefit number is just a function of the three feedback metrics that get logged with missing index requests: uses * impact * average query cost.

Uses is a fairly reliable metric, but impact and average query cost are a little more hand-wavy. Even high-cost queries can be very fast. It doesn’t mean that they can’t be tuned or don’t need indexes, but they might not be your worst-performers.

In tomorrow’s post, we’ll look at that, and how you can come up with a good-enough index for a bunch of similar queries.

Thanks for reading!

The Cascades Framework for Query Optimization at Microsoft (Nico Bruno + Cesar Galindo-Legaria)

All About Me


The fine folks over at the Carnegie Mellon Database Group have been putting on a series of talks about different databases, and they finally got around to my beloved SQL Server.

This is a really interesting talk, but don’t stop there. Be sure to check out their other videos. They’re a little more database agnostic, but still generally useful.

Also, Andy Pavlo is a cutie patootie.

 

Using Plan Guides To Get Around Query Hints

Prophesy As


According to Not-Australians, there used to be a trace flag that would get queries to ignore any supplied hints. It doesn’t work anymore, which sucks, kinda.

Because people do lots of stupid things with hints. Real stupid things. Things you wouldn’t believe the stupid of.

Let’s say, for example, hypothetically of course, that your front end application would add an index hint to every query.

That index hint may or not be helpful to your query in any way. But there it is.

Let’s also posit, using the very depths of our imaginations, that the front end developer was unlikely to change that behavior.

Planning Fields


We’ve got a couple indexes:

CREATE INDEX r 
    ON dbo.Users(Reputation) 
WITH(MAXDOP = 8, SORT_IN_TEMPDB = ON);

CREATE INDEX c 
    ON dbo.Users(CreationDate) 
WITH(MAXDOP = 8, SORT_IN_TEMPDB = ON);

And we’ve got a query that, via an index hint, is being forced to use the wrong index.

DECLARE @Reputation int = 2;
EXEC sp_executesql N'SELECT * FROM dbo.Users WITH (INDEX  = c) WHERE Reputation = @Reputation;',
                   N'@Reputation int',
                   @Reputation;

The ensuing query plan makes no sense whatsoever.

i really mean it

The things are all backwards. We scan the entire nonclustered index, and do a lookup to the clustered index just to evaluate the @Reputation predicate.

The idea is bad. Please don’t do the idea.

Guiding Bright


There are two things we could do here. We could hint the query to use the index we want, sure.

But what if we change something about this index, or add another one to the table? We might want the optimizer to have a bit more freedom to choose.

I mean, I know. That has its own risks, but whatever.

We can add a plan guide that looks like this:

EXEC sp_create_plan_guide
@name = N'dammit',
@stmt = N'SELECT * FROM dbo.Users WITH (INDEX  = c) WHERE Reputation = @Reputation;',
@type = N'SQL',
@module_or_batch = NULL,
@params = N'@Reputation int',
@hints =  N'OPTION(TABLE HINT(dbo.Users))';

If we were writing proper queries where tables are aliased, it’d look like this:

EXEC sp_create_plan_guide
@name = N'dammit',
@stmt = N'SELECT u.* FROM dbo.Users AS u WITH (INDEX  = c) WHERE u.Reputation = @Reputation;',
@type = N'SQL',
@module_or_batch = NULL,
@params = N'@Reputation int',
@hints =  N'OPTION(TABLE HINT(u))';

When we re-run our query, things look a lot better:

focus

Thanks for reading!

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.

A Where Clause Problem Recompile Doesn’t Fix

Fast 1


After blogging recently (maybe?) about filters, there was a Stack Exchange question about a performance issue when a variable was declared with a max type.

After looking at it for a minute, I realized that I had never actually checked to see if a recompile hint would allow the optimizer more freedom when dealing with them.

CREATE INDEX u 
    ON dbo.Users(DisplayName);

DECLARE @d nvarchar(MAX) = N'Jon Skeet';

SELECT 
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
WHERE u.DisplayName = @d;

SELECT 
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
WHERE u.DisplayName = @d
OPTION(RECOMPILE);

Turns out that it won’t, which is surprising.

happy cheese

Even though both plans have sort of a weird seek, the filter operator remains as a weird sort of residual predicate.

truly try me

Thanks for reading!

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.

Annoyances When Indexing For Windowing Functions

One Day


I will be able to not care about this sort of thing. But for now, here we are, having to write multiple blogs in a day to cover a potpourri of grievances.

Let’s get right to it!

First, without a where clause, the optimizer doesn’t think that an index could improve one single, solitary metric about this query. We humans know better, though.

WITH Votes AS 
(
    SELECT
        v.Id,
        ROW_NUMBER() 
            OVER(PARTITION BY 
                     v.PostId 
                 ORDER BY 
                     v.CreationDate) AS n
    FROM dbo.Votes AS v
)
SELECT *
FROM Votes AS v
WHERE v.n = 0;

The tough part of this plan will be putting data in order to suit the Partition By, and then the Order By, in the windowing function.

Without any other clauses against columns in the Votes table, there are no additional considerations.

Two Day


What often happens is that someone wants to add an index to help the windowing function along, so they follow some basic guidelines they found on the internet.

What they end up with is an index on the Partition By, Order By, and then Covering any additional columns. In this case there’s no additional Covering Considerations, so we can just do this:

CREATE INDEX v2 ON dbo.Votes(PostId, CreationDate);

If you’ve been following my blog, you’ll know that indexes put data in order, and that with this index you can avoid needing to physically sort data.

limousine

Three Day


The trouble here is that, even though we have Cost Threshold For Parallelism (CTFP) set to 50, and the plan costs around 195 Query Bucks, it doesn’t go parallel.

Creating the index shaves about 10 seconds off the ordeal, but now we’re stuck with this serial calamity, and… forcing it parallel doesn’t help.

Our old nemesis, repartition streams, is back.

wackness

Even at DOP 8, we only end up about 2 seconds faster. That’s not a great use of parallelism, and the whole problem sits in the repartition streams.

This is, just like we talked about yesterday, a row mode problem. And just like we talked about the day before that, windowing functions generally do benefit from batch mode.

Thanks for reading!

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.