Considerations For Implementing Soft Deletes

From The Beginning


Implementing soft deletes for an app that’s been around for a while can be tough. In the same way as implementing Partitioning can be tough to add in later to get data management value from (rebuilding clustered indexes on the scheme, making sure all nonclustered indexes are aligned, and all future indexes are too, and making sure you have sufficient partitions at the beginning and end for data movement).

Boy, I really stressed those parentheses out.

If you do either one from the outset, it’s far less painful to manage. The structural stuff is there for you from the beginning, and you can test different strategies early on before data change become difficult to manage.

Queries


The first and most obvious thing is that all your queries now need to only find data that isn’t deleted.

Almost universally, it’s easier to put views on top of tables that have the appropriate bit search for deleted or not deleted rows than to expect people to remember it.

CREATE VIEW dbo.Users_Active
AS
SELECT 
    u.*
FROM dbo.Users AS u
WHERE u.is_deleted = 0;

CREATE VIEW dbo.Users_Inactive
AS
SELECT 
    u.*
FROM dbo.Users AS u
WHERE u.is_deleted = 1;

It’s not that views have any magical performance properties; they’re just queries after all, but it gives you an explicit data source.

Indexes


Depending on how your other queries search for data, you may need to start accounting for the is_deleted flag in your indexes. This could make a really big difference if the optimizer stops choosing your narrower nonclustered indexes because it hates key lookups.

Typically, other predicates will give you a selective-enough result set that a residual predicate on a bit field won’t make much difference. If you’ve already got a seek to the portion of data you’re interested in and most of it will be not-deleted, who cares?

And let’s be honest, in most implementations deleted rows will be the minority of data, and searches for it will be far less common. Usually it’s just there for an occasional audit.

In adjacent cases where instead of deleted you need to designate things as currently active, and you may have many inactive rows compared to active rows, filtered indexes can be your best friend.

Coming back to the views, I don’t think that making them indexed is necessary by default, but it might be if you’re using forced parameterization and filtered indexes.

CREATE TABLE dbo.Users(id int, is_deleted bit);
GO 

CREATE INDEX u ON dbo.Users (id) WHERE is_deleted = 0;
GO

SELECT 
    u.id, u.is_deleted
FROM dbo.Users AS u
WHERE u.is_deleted = 0;

Under simple parameterization, this can be fine. Under forced parameterization, things can get weird.

tutor the tutors

Tables and Tables


In some cases, it might be easier to create tables specifically for deleted rows so you don’t have unnecessary data in your main tables. You can implement this easily enough with after triggers. Just make sure they’re designed to handle multiple rows.

If you want something out of the box, you might mess with:

  • Temporal tables
  • Change Data Capture
  • Change Tracking

However, none of those help you deal with who deleted rows. For that, you’ll need an Audit.

Thanks for reading!

Simplifying Archival Processes

Long List Of Demands


Many suggestions for making changes or improvements to a process in a database are met like orders that mean, if not certain, then likely death.

EXT. A WEBEX MEETING

… Erik is smiling, happy to be wrapping up and moving on. Client DBAs and Developers are sweating profusely, mechanical keyboards all heavily armed with Cherry MX Blue switches in an attempt to drown out the part where he talks about NOLOCK hints. For some reason everyone is smoking cigarettes and finalizing their wills

ERIK

You should enable the remote DAC, it’ll be helpful next time you hit THREADPOOL.

There’s a script included to turn it on, and a bunch of examples on how to use it.

IT Manager

By God, there’s no telling who of us will survive this, but we’ll see’er through.

Oh, Now You Need Consistency


Well, friends, I have good news for you. This is an easy one to implement.

Let’s say that in Stack Overflow land, when a user deletes their account we also delete all their votes. That’s not how it works, but it’s how I’m going to show you how to condense what can normally be a difficult process to isolate into a single operation.

First, we need an archive table. So let it be done.

CREATE TABLE dbo.Votes_Archive
(
    Id int NOT NULL,
    PostId int NOT NULL,
    UserId int NULL,
    BountyAmount int NULL,
    VoteTypeId int NOT NULL,
    CreationDate datetime NOT NULL,
    CONSTRAINT PK_VotesArchive_Id
        PRIMARY KEY CLUSTERED (Id ASC)
);

Next, we need a query to, like, do things. I guess.

Normally, you’d insert the rows you want to keep into the archive table, and then remove them from the source table by using a delete … where exists  type query. Usually in a transaction, because you heard you need them.

Savvier practitioners might even use the output clause with their delete to dump things into a #temp table.

But the savviest practitioners will do it all in one go.

Sauvignon Blanc


What a lot of people miss about output is that the results are tabular, like many other things in a database.

Just like you can select from deleted and inserted virtual tables in a trigger, you can select from them in a… select statement.

I’m using a transaction here because I want to roll it back.

BEGIN TRAN

INSERT dbo.Votes_Archive
    (Id, PostId, UserId, BountyAmount, VoteTypeId, CreationDate)
SELECT 
    v.*
FROM 
(
    DELETE v
        OUTPUT
            Deleted.Id,
            Deleted.PostId,
            Deleted.UserId,
            Deleted.BountyAmount,
            Deleted.VoteTypeId,
            Deleted.CreationDate
    FROM dbo.Votes AS v
    WHERE v.UserId = 190597
) AS v;

--Validate stuff
SELECT * FROM dbo.Votes AS v WHERE v.UserId = 190597;

SELECT * FROM dbo.Votes_Archive AS vi;
--Stop validating stuff, nutso
ROLLBACK

So when whoever user 190597 is deletes their account, all their votes are inserted into the archive table at the same time they’re deleted.

The execution plan looks like this:

polar seltzer

Which is pretty nifty, and eliminates the need for a lot of jumping around.

Thanks for reading!

Defeating Parameter Sniffing With Dynamic SQL

Enjoy!


Thanks for watching!

 

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.

Using Plan Guides To Get Around Query Hints

Prophesy As


According to Not-Australians, there used to be a trace flag that would get queries to ignore any supplied hints. It doesn’t work anymore, which sucks, kinda.

Because people do lots of stupid things with hints. Real stupid things. Things you wouldn’t believe the stupid of.

Let’s say, for example, hypothetically of course, that your front end application would add an index hint to every query.

That index hint may or not be helpful to your query in any way. But there it is.

Let’s also posit, using the very depths of our imaginations, that the front end developer was unlikely to change that behavior.

Planning Fields


We’ve got a couple indexes:

CREATE INDEX r 
    ON dbo.Users(Reputation) 
WITH(MAXDOP = 8, SORT_IN_TEMPDB = ON);

CREATE INDEX c 
    ON dbo.Users(CreationDate) 
WITH(MAXDOP = 8, SORT_IN_TEMPDB = ON);

And we’ve got a query that, via an index hint, is being forced to use the wrong index.

DECLARE @Reputation int = 2;
EXEC sp_executesql N'SELECT * FROM dbo.Users WITH (INDEX  = c) WHERE Reputation = @Reputation;',
                   N'@Reputation int',
                   @Reputation;

The ensuing query plan makes no sense whatsoever.

i really mean it

The things are all backwards. We scan the entire nonclustered index, and do a lookup to the clustered index just to evaluate the @Reputation predicate.

The idea is bad. Please don’t do the idea.

Guiding Bright


There are two things we could do here. We could hint the query to use the index we want, sure.

But what if we change something about this index, or add another one to the table? We might want the optimizer to have a bit more freedom to choose.

I mean, I know. That has its own risks, but whatever.

We can add a plan guide that looks like this:

EXEC sp_create_plan_guide
@name = N'dammit',
@stmt = N'SELECT * FROM dbo.Users WITH (INDEX  = c) WHERE Reputation = @Reputation;',
@type = N'SQL',
@module_or_batch = NULL,
@params = N'@Reputation int',
@hints =  N'OPTION(TABLE HINT(dbo.Users))';

If we were writing proper queries where tables are aliased, it’d look like this:

EXEC sp_create_plan_guide
@name = N'dammit',
@stmt = N'SELECT u.* FROM dbo.Users AS u WITH (INDEX  = c) WHERE u.Reputation = @Reputation;',
@type = N'SQL',
@module_or_batch = NULL,
@params = N'@Reputation int',
@hints =  N'OPTION(TABLE HINT(u))';

When we re-run our query, things look a lot better:

focus

Thanks for reading!

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.

A Suggestion To Make Lookups Less Painful

Odor Of Gas


One problem with Lookups, aside from the usual complaints, is that the optimizer has no options for when the lookup happens.

If the optimizer decides to use a nonclustered index to satisfy some part of the query, but the nonclustered index doesn’t have all of the columns needed to cover what the query is asking for, it has to do a lookup.

Whether the lookup is Key or RID depends on if the table has a clustered index, but that’s not entirely the point.

The point is that there’s no way for the optimizer to decide to defer the lookup until later in the plan, when it might be more opportune.

Gastric Acid


Let’s take one index, and two queries.

CREATE INDEX p
    ON dbo.Posts(PostTypeId, Score, CreationDate)
    INCLUDE(OwnerUserId);

Stop being gross.

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 6
ORDER BY p.CreationDate DESC;

The main point here is not that the lookup is bad; it’s actually good, and I wish both queries would use one.

odd choice

If we hint the first query to use the nonclustered index, things turn out better.

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p WITH(INDEX = p)
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;
woah woah woah you can’t use hints here this is a database

Running a full second faster seems like a good thing to me, but there’s a problem.

Ingest


Whether we use the lookup or scan the clustered index, all of these queries ask for rather large memory grants, between 5.5 and 6.5 GB

bigsort4u

The operator asking for memory is the Sort — and while I’d love it if we could index for every sort — it’s just not practical.

So like obviously changing optimizer behavior is way more practical. Ahem.

The reason that the Sort asks for so much memory in each of these cases is that it’s forced to order the entire select output from the Posts table by the CreationDate column.

donk

Detach


If we rewrite the query a bit, we can get the optimizer to sort data long before we go get all the output columns:

SELECT TOP (1000)
    u.DisplayName,
    p2.*
FROM dbo.Posts AS p
JOIN dbo.Posts AS p2
    ON p.Id = p2.Id
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;

SELECT TOP (1000)
    u.DisplayName,
    p2.*
FROM dbo.Posts AS p
JOIN dbo.Posts AS p2
    ON p.Id = p2.Id
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 6
ORDER BY p.CreationDate DESC;

In both cases, we get the same query plan shape, which is what we’re after:

  • Seek into the nonclustered index on Posts
  • Sort data by CreationDate
  • Join Posts to Users first
  • Join back to Posts for the select list columns
weeeeeeeeee

Because the Sort happens far earlier on in the plan, there’s less of a memory grant needed, and by quite a stretch from the 5+ GB before.

turn down

Thanks for reading!

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.

A Where Clause Problem Recompile Doesn’t Fix

Fast 1


After blogging recently (maybe?) about filters, there was a Stack Exchange question about a performance issue when a variable was declared with a max type.

After looking at it for a minute, I realized that I had never actually checked to see if a recompile hint would allow the optimizer more freedom when dealing with them.

CREATE INDEX u 
    ON dbo.Users(DisplayName);

DECLARE @d nvarchar(MAX) = N'Jon Skeet';

SELECT 
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
WHERE u.DisplayName = @d;

SELECT 
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
WHERE u.DisplayName = @d
OPTION(RECOMPILE);

Turns out that it won’t, which is surprising.

happy cheese

Even though both plans have sort of a weird seek, the filter operator remains as a weird sort of residual predicate.

truly try me

Thanks for reading!

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.

A Parameterization Puzzle With TOP Follow-Up

Spell It Out


Back in October, I had written a couple posts about how parameterizing TOP can cause performance issues:

Anyway, I got back to thinking about it recently because a couple things had jogged in my foggy brain around table valued functions and parameter sniffing.

Go figure.

Reading Rainbow


One technique you could use to avoid this would be to use an inline table valued function, like so:

CREATE OR ALTER FUNCTION dbo.TopParam(@Top bigint)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
SELECT TOP (@Top)
    u.DisplayName,
    b.Name
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT TOP (1)
        b.Name
    FROM dbo.Badges AS b
    WHERE b.UserId = u.Id
    ORDER BY b.Date DESC
) AS b
WHERE u.Reputation > 10000
ORDER BY u.Reputation DESC;
GO

When we select from the function, the top parameter is interpreted as a literal.

SELECT 
    tp.*
FROM dbo.TopParam(1) AS tp;

SELECT 
    tp.*
FROM dbo.TopParam(38) AS tp;
genius!

Performance is “fine” for both in that neither one takes over a minute to run. Good good.

Departures


This is, of course, not what happens in a stored procedure or parameterized dynamic SQL.

EXEC dbo.ParameterTop @Top = 1;
doodad

Keen observers will note that this query runs for 1.2 seconds, just like the plan for the function above.

That is, of course, because this is the stored procedure’s first execution. The @Top parameter has been sniffed, and things have been optimized for the sniffed value.

If we turn around and execute it for 38 rows right after, we’ll get the “fine” performance noted above.

EXEC dbo.ParameterTop @Top = 38;

Looking at the plan in a slightly different way, here’s what the Top operator is telling us, along with what the compile and runtime values in the plan are:

snort

It may make sense to make an effort to cache a plan with @Top = 1 initially to get the “fine” performance. That estimate is good enough to get us back to sending the buffers quickly.

Buggers


Unfortunately, putting the inline table valued function inside the stored procedure doesn’t offer us any benefit.

Without belaboring the point too much:

CREATE PROCEDURE dbo.ParameterTopItvf(@Top BIGINT)  
AS  
BEGIN  
    SET NOCOUNT, XACT_ABORT ON;  
  
    SELECT   
        tp.*  
    FROM dbo.TopParam(@Top) AS tp;  
  
END;  
GO 

EXEC dbo.ParameterTopItvf @Top = 1;

EXEC dbo.ParameterTopItvf @Top = 38;

EXEC sp_recompile 'dbo.ParameterTopItvf';

EXEC dbo.ParameterTopItvf @Top = 38;

EXEC dbo.ParameterTopItvf @Top = 1;

If we do this, running for 1 first gives us “fine” performance, but running for 38 first gives us the much worse performance.

Thanks for reading!

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.

When Does UDF Inlining Kick In?

The Eye


UPDATE: After writing this and finding the results fishy, I reported the behavior described below in “Somewhat Surprising” and “Reciprocal?” and it was confirmed a defect in SQL Server 2019 CU8, though I haven’t tested earlier CUs to see how far back it goes. If you’re experiencing this behavior, you’ll have to disable UDF inlining in another way, until CU releases resume in the New Year.

With SQL Server 2019, UDF inlining promises to, as best it can, inline all those awful scalar UDFs that have been haunting your database for ages and making queries perform terribly.

But on top of the long list of restrictions, there are a number of other things that might inhibit it from kicking in.

For example, there’s a database scoped configuration:

ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = ON/OFF; --Toggle this

SELECT 
    dsc.*
FROM sys.database_scoped_configurations AS dsc
WHERE dsc.name = N'TSQL_SCALAR_UDF_INLINING';

There’s a function characteristic you can use to turn them off:

CREATE OR ALTER FUNCTION dbo.whatever()
RETURNS something
WITH INLINE = ON/OFF --Toggle this
GO

And your function may or not even be eligible:

SELECT 
    OBJECT_NAME(sm.object_id) AS object_name,
    sm.is_inlineable
FROM sys.sql_modules AS sm
JOIN sys.all_objects AS ao
    ON sm.object_id = ao.object_id
WHERE ao.type = 'FN';

Somewhat Surprising


One thing that caught me off guard was that having the database in compatibility level 140, but running the query in compatibility level 150 also nixed the dickens out of it.

DBCC FREEPROCCACHE;
GO 

ALTER DATABASE StackOverflow2013 SET COMPATIBILITY_LEVEL = 140;
GO 

WITH Comments AS 
(
    SELECT
        dbo.serializer(1) AS udf, --a function
        ROW_NUMBER() 
            OVER(ORDER BY 
                     c.CreationDate) AS n
    FROM dbo.Comments AS c
)
SELECT 
    c.*
FROM Comments AS c
WHERE c.n BETWEEN 1 AND 100
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_150'), MAXDOP 8);
GO

Our query has all the hallmarks of one that has been inflicted with functions:

it can’t go parallel

And if you’re on SQL Server 2016+, you can see that it executes once per row:

SELECT 
    OBJECT_NAME(defs.object_id) AS object_name,
    defs.execution_count,
    defs.total_worker_time,
    defs.total_physical_reads,
    defs.total_logical_writes,
    defs.total_logical_reads,
    defs.total_elapsed_time
FROM sys.dm_exec_function_stats AS defs;
rockin’ around

Reciprocal?


There’s an odd contradiction here, though. If we repeat the experiment setting the database compatibility level to 150, but running the query in compatibility level 140, the function is inlined.

DBCC FREEPROCCACHE;
GO 

ALTER DATABASE StackOverflow2013 SET COMPATIBILITY_LEVEL = 150;
GO 

WITH Comments AS 
(
    SELECT
        dbo.serializer(c.Id) AS udf,
        ROW_NUMBER() 
            OVER(ORDER BY 
                     c.CreationDate) AS n
    FROM dbo.Comments AS c
)
SELECT 
    c.*
FROM Comments AS c
WHERE c.n BETWEEN 1 AND 100
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140'), MAXDOP 8);
GO

Rather than seeing a non-parallel plan, and non-parallel plan reason, we see a parallel plan, and an attribute telling us that a UDF has been inlined.

call hope

And if we re-check the dm_exec_function_stats DMV, it will have no entries. That seems more than a little bit weird to me, but hey.

I’m just a lowly consultant on SSMS 18.6

Thanks for reading!

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.

Annoyances When Indexing For Windowing Functions

One Day


I will be able to not care about this sort of thing. But for now, here we are, having to write multiple blogs in a day to cover a potpourri of grievances.

Let’s get right to it!

First, without a where clause, the optimizer doesn’t think that an index could improve one single, solitary metric about this query. We humans know better, though.

WITH Votes AS 
(
    SELECT
        v.Id,
        ROW_NUMBER() 
            OVER(PARTITION BY 
                     v.PostId 
                 ORDER BY 
                     v.CreationDate) AS n
    FROM dbo.Votes AS v
)
SELECT *
FROM Votes AS v
WHERE v.n = 0;

The tough part of this plan will be putting data in order to suit the Partition By, and then the Order By, in the windowing function.

Without any other clauses against columns in the Votes table, there are no additional considerations.

Two Day


What often happens is that someone wants to add an index to help the windowing function along, so they follow some basic guidelines they found on the internet.

What they end up with is an index on the Partition By, Order By, and then Covering any additional columns. In this case there’s no additional Covering Considerations, so we can just do this:

CREATE INDEX v2 ON dbo.Votes(PostId, CreationDate);

If you’ve been following my blog, you’ll know that indexes put data in order, and that with this index you can avoid needing to physically sort data.

limousine

Three Day


The trouble here is that, even though we have Cost Threshold For Parallelism (CTFP) set to 50, and the plan costs around 195 Query Bucks, it doesn’t go parallel.

Creating the index shaves about 10 seconds off the ordeal, but now we’re stuck with this serial calamity, and… forcing it parallel doesn’t help.

Our old nemesis, repartition streams, is back.

wackness

Even at DOP 8, we only end up about 2 seconds faster. That’s not a great use of parallelism, and the whole problem sits in the repartition streams.

This is, just like we talked about yesterday, a row mode problem. And just like we talked about the day before that, windowing functions generally do benefit from batch mode.

Thanks for reading!

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.

An Overlooked Benefit Of Batch Mode In Parallel Plans

Make It Count


When queries go parallel, you want them to be fast. Sometimes they are, and it’s great.

Other times they’re slow, and you end up staring helplessly at a repartition streams operator.

brick wall

Sometimes you can reduce the problem with higher DOP hints, or better indexing, but overall it’s a crappy situation.

Snap To


Let’s admire a couple familiar looking queries, because that’s been working really well for us so far.

WITH Comments AS 
(
    SELECT
        ROW_NUMBER() 
            OVER(PARTITION BY 
                     c.UserId
                 ORDER BY 
                     c.CreationDate) AS n
    FROM dbo.Comments AS c
)
SELECT *
FROM Comments AS c
WHERE c.n = 0
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140'));

WITH Comments AS 
(
    SELECT
        ROW_NUMBER() 
            OVER(PARTITION BY 
                     c.UserId
                 ORDER BY 
                     c.CreationDate) AS n
    FROM dbo.Comments AS c
)
SELECT *
FROM Comments AS c
WHERE c.n = 0
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_150'));

One is going to run in compatibility level 140, the other in 150, as foretold by ancient alien prophecy.

The two query plans will have a bit in common, but…

just batch

The second query, which runs in batch mode, runs about 15 seconds faster. One big reason why is that we skip that most unfortunate repartition streams operator.

It’s a cold sore. An actual factual cold sore.

The only ways I’ve found to fix it completely are:

  • Induce batch mode
  • Use the parallel apply technique

But the parallel apply technique doesn’t help much here, because of local factors.

In this case, me generating the largest possible result set and then filtering it down to nothing at the end.

Thanks for reading!

A Word From Our Sponsors


First, a huge thank you to everyone who has bought my training so far. You all are incredible, and I owe all of you a drink.

Your support means a lot to me, and allows me to do nice stuff for other people, like give training away for free.

So far, I’ve donated $45k (!!!) worth of training to folks in need, no questions asked.

Next year, I’d like to keep doing the same thing. I’d also like to produce a whole lot more training to add value to the money you spend. In order to do that, I need to take time off from consulting, which isn’t easy to do. I’m not crying poor, but saying no to work for chunks of time isn’t easy for a one-person party.

I’m hoping that I can make enough in training bucks to make that possible.

Because this sale is extra sale-y, I’ve decided to name it after the blackest black known to man.

From today until December 31st, you can get all 25 hours of my recorded training content for just $100.00. If you click the link below to add everything to your cart, and use the discount code AllFor100 to apply a discount to your cart.

Everything

Everything

Everything

Some fine print: It only works if you add EVERYTHING. It’s a fixed amount discount code that you need to spend a certain amount to have kick in.

Thank for reading, and for your support.