Improving AT TIME ZONE Performance

Queries that use the AT TIME ZONE operator may perform worse than desired. For example, Jonathan Kehayias blogged about performance struggles with AT TIME ZONE at sqlskills.com. One key difference between Jonathan’s opinion and my own is that I enjoy writing these kinds of posts.

Test Data

The test data consists of one million rows per day over the month of January 2022 for a total of 31 million rows.

DROP TABLE IF EXISTS dbo.CCIForBlog;

CREATE TABLE dbo.CCIForBlog (
    SaleTimeUTC DATETIME2 NOT NULL,
    WidgetCount BIGINT NOT NULL,
    INDEX CCI CLUSTERED COLUMNSTORE
);

GO

SET NOCOUNT ON;

DECLARE
    @StartDate DATETIME2 = '20220101',
    @DaysToLoad INT = 31,
    @DaysLoaded INT = 0;

WHILE @DaysLoaded < @DaysToLoad
BEGIN
    INSERT INTO dbo.CCIForBlog (SaleTimeUTC, WidgetCount)
    SELECT DATEADD(SECOND, q.RN / 11.5, @StartDate), q.RN / 100000
    FROM
    (
        SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
        FROM master..spt_values t1
        CROSS JOIN master..spt_values t2
    ) q (RN)
    OPTION (MAXDOP 1);

    SET @StartDate = DATEADD(DAY, 1, @StartDate);
    SET @DaysLoaded = @DaysLoaded + 1;
END;

CREATE STATISTICS S1 ON CCIForBlog (SaleTimeUTC);

Data is saved to a table with a clustered columnstore index. Of course, this is a small amount of data for a columnstore table. However, it is more than sufficient to demonstrate AT TIME ZONE as a performance bottleneck.

Filtering

Suppose an end user wants a count of widgets sold between January 3rd and January 6th. A first attempt at this query could look like the following:

SELECT SUM(WidgetCount)
FROM dbo.CCIForBlog c
WHERE c.SaleTimeUTC >= '20220103'
AND c.SaleTimeUTC < '20220106'
OPTION (MAXDOP 1);

This query plays to all of columnstore’s strengths and it only takes around 10 milliseconds to execute. The data was loaded in date order so most of the rowgroups are eliminated. However, end users don’t think in terms of UTC time. The end user actually wanted Central Standard Time. After a bit of research, the query is changed to as follows:

SELECT SUM(WidgetCount)
FROM dbo.CCIForBlog c
WHERE SWITCHOFFSET(c.SaleTimeUTC, 0) AT TIME ZONE 'Central Standard Time' >= '20220103'
AND SWITCHOFFSET(c.SaleTimeUTC, 0) AT TIME ZONE 'Central Standard Time' < '20220106'
OPTION (MAXDOP 1);

The AT TIME ZONE operator is useful when the number of offset minutes is unknown. UTC always has an offset of 0 so it is better to use SWITCHOFFSET(datetimeoffset_expression, 0)as opposed to AT TIME ZONE ‘UTC’. Even so, the query still takes over 3 minutes to execute on my machine. Nearly all of the execution time is spent on performing AT TIME ZONE calculations:

Note that using c.SaleTimeUTC AT TIME ZONE 'UTC' AT TIME ZONE 'Central Standard Time' would double the execution time.

One idea is to rewrite the filtering so that the time zone conversions are done on the constant values instead of the column:

WHERE c.SaleTimeUTC >= SWITCHOFFSET('20220103' AT TIME ZONE 'Central Standard Time', 0)
AND c.SaleTimeUTC < SWITCHOFFSET('20220106' AT TIME ZONE 'Central Standard Time', 0)

I strongly recommend against this approach. It can lead to wrong results for some time zones and boundary points. Instead, you can use the fact that datetimeoffset only supports an offset of up to +-14 hours. That means that (local – 14 hours) < UTC < (local + 14 hours) is true for any time zone and any point in time. A logically redundant filter can be added to the query:

SELECT SUM(WidgetCount)
FROM dbo.CCIForBlog c
WHERE SWITCHOFFSET(c.SaleTimeUTC, 0) AT TIME ZONE 'Central Standard Time' >= '20220103'
AND SWITCHOFFSET(c.SaleTimeUTC, 0) AT TIME ZONE 'Central Standard Time' < '20220106'
AND c.SaleTimeUTC >= DATEADD(HOUR, -14, '20220103')
AND c.SaleTimeUTC < DATEADD(HOUR, 14, '20220106')
OPTION (MAXDOP 1);

The newly improved query finishes in around 26 seconds. It is able to benefit from rowgroup elimination and performs significantly fewer time zone conversions compared to the original query. In this case, we were able to use knowledge about time zones and a bit of date math to improve performance from over 3 minutes to about 26 seconds.

Grouping

Suppose that an end user wants widget counts summarized by date. A first attempt at this query could look like the following:

SELECT ca.ConvertedDate, SUM(WidgetCount)
FROM dbo.CCIForBlog c
CROSS APPLY (
    SELECT CAST(c.SaleTimeUTC AS DATE)
) ca (ConvertedDate)
GROUP BY ca.ConvertedDate
OPTION (MAXDOP 1);

This query takes about 1 second on my machine. However, once again, the end user wants the data to be in CST instead of UTC. The following approach takes around 3 minutes:

SELECT ca.ConvertedDate, SUM(WidgetCount)
FROM dbo.CCIForBlog c
CROSS APPLY (
    SELECT CAST(SWITCHOFFSET(c.SaleTimeUTC, 0) AT TIME ZONE 'Central Standard Time' AS DATE)
) ca (ConvertedDate)
GROUP BY ca.ConvertedDate
OPTION (MAXDOP 1);

This should not be a surprise because the bottleneck in query performance is performing 31 million AT TIME ZONE calculations. That doesn’t change if the query performs filtering or grouping.

Historically, governments only perform daylight saving time or offset switches on the minute. For example, an offset won’t change at 2:00:01 AM, but it might change at 2:00:00 AM. The source data has one million rows per day, so grouping the date truncated to the minute, applying the time zone conversion to the truncated distinct values, and finally grouping by date should lead to significant performance improvement. One way to accomplish this:

SELECT ca.ConvertedDate, SUM(SumWidgetCount)
FROM 
(
    SELECT DATEADD(MINUTE, DATEDIFF(MINUTE, '20000101', c.SaleTimeUTC), '20000101'), SUM(WidgetCount)
    FROM dbo.CCIForBlog c
    GROUP BY DATEADD(MINUTE, DATEDIFF(MINUTE, '20000101', c.SaleTimeUTC), '20000101')
) q (SaleTimeUTCTrunc, SumWidgetCount)
CROSS APPLY (
    SELECT CAST(SWITCHOFFSET(q.SaleTimeUTCTrunc, 0) AT TIME ZONE 'Central Standard Time' AS DATE)
) ca (ConvertedDate)
GROUP BY ca.ConvertedDate
OPTION (MAXDOP 1);

The new query takes around 4 seconds on my machine. It needs to perform 44650 time zone conversions instead of 31 million. Once again, we were able to use knowledge about time zones and a bit of date math to improve performance.

Functioning

I’ve developed and open sourced replacement functions for AT TIME ZONE to provide an easier way of improving performance for queries that use AT TIME ZONE. The TZGetOffsetsDT2 function returns a pair of offsets and the TZFormatDT2 function transforms those offsets into the desired data type. The filtering query can be written as the following:

SELECT SUM(WidgetCount)
FROM dbo.CCIForBlog c
OUTER APPLY dbo.TZGetOffsetsDT2 (c.SaleTimeUTC, N'UTC', N'Central Standard Time') o
CROSS APPLY dbo.TZFormatDT2 (c.SaleTimeUTC, N'UTC', N'Central Standard Time', o.OffsetMinutes, o.TargetOffsetMinutes) f
WHERE f.ConvertedDateTime2 >= '20220103'
AND f.ConvertedDateTime2 < '20220106'
OPTION (MAXDOP 1);

The new query takes around 10 seconds to execute. The new query plan is able to use batch mode processing at every step:

 

If desired, performance can be further improved by adding the same redundant filters as before:

SELECT SUM(WidgetCount)
FROM dbo.CCIForBlog c
OUTER APPLY dbo.TZGetOffsetsDT2 (c.SaleTimeUTC, N'UTC', N'Central Standard Time') o
CROSS APPLY dbo.TZFormatDT2 (c.SaleTimeUTC, N'UTC', N'Central Standard Time', o.OffsetMinutes, o.TargetOffsetMinutes) f
WHERE f.ConvertedDateTime2 >= '20220103'
AND f.ConvertedDateTime2 < '20220106'
AND c.SaleTimeUTC >= DATEADD(HOUR, -14, '20220103')
AND c.SaleTimeUTC < DATEADD(HOUR, 14, '20220106')
OPTION (MAXDOP 1);

The most optimized version takes around 1 second to execute. Quite an improvement compared to 3 minutes!

The grouping query can also be rewritten using the new functions:

SELECT f.ConvertedDate, SUM(WidgetCount)
FROM dbo.CCIForBlog c
OUTER APPLY dbo.TZGetOffsetsDT2 (c.SaleTimeUTC, N'UTC', N'Central Standard Time') o
CROSS APPLY dbo.TZFormatDT2 (c.SaleTimeUTC, N'UTC', N'Central Standard Time', o.OffsetMinutes, o.TargetOffsetMinutes) f
GROUP BY f.ConvertedDate
OPTION (MAXDOP 1);

This version takes about 7 seconds to execute. This is slower than the date truncation method which took 4 seconds but still much faster than the original AT TIME ZONE method.

Unfortunately, SQL Server limitations require a pair of functions to get the best performance. The open source project does provide a simpler function that can be called by itself but it is only eligible for nested loop joins.

Upgrading

SQL Server 2022 RC1 has signs of improvement for AT TIME ZONE. The basic queries that use AT TIME ZONE in this blog post take about 75 seconds to execute on my machine, so it could be estimated that SQL Server 2022 will reduce the CPU footprint of AT TIME ZONE by 60%. Release candidates are not fully optimized so it’s possible that final performance numbers will be different once the product goes GA. I suspect that these performance improvements are already present in Azure SQL Database but I can’t find any documentation for the change.

Final Thoughts

Please try my open source project if you’re experiencing performance problems with AT TIME ZONE. Thanks for reading!

Fixing Ordered Column Store Sorting In SQL Server 2022

Groove Is In The Heart


When Brent posted about the availability of, and disappointment with creating ordered column store indexes in SQL Server 2022, I got to work.

I can’t have my dear friend Brent being all distraught with all those fast cars around. That’s how accidents happen, and I fear he might leave the Blitz scripts to me in his will or something.

Anyway, I decided to dig in and see what was going on behind the scenes. Which of course, means query plans, and bothering people who are really good at debuggers.

Most of the problems that you’ll run into in SQL Server will come from sorting data.

Whenever I have to think about Sorts, I head to this post about all the different Sorts you might see in a query plan.

More on that later, though.

Cod Piece


In Paul’s post, he talks about using undocumented trace flag 8666 to get additional details about Sort operators.

Let’s do that. Paul is smart, though he is always completely wrong about which season it is.

DROP TABLE IF EXISTS
    dbo.Votes_CCI;

SELECT
    v.*
INTO dbo.Votes_CCI
FROM dbo.Votes AS v;

I’m using the Votes table because it’s nice and narrow and I don’t have to tinker with any string columns.

Strings in databases were a mistake, after all.

DBCC TRACEON(8666);
CREATE CLUSTERED COLUMNSTORE INDEX
    vcci
ON dbo.Votes_CCI
ORDER (Postid);
DBCC TRACEOFF(8666);

Here’s what we get back in the query plan:

SQL Server Query Plan
Tainted Sort

We’ve got a Soft Sort! What does our seasonally maladjusted friend say about those?

A “soft sort” uses only its primary memory grant and never spills. It doesn’t guarantee fully-sorted output. Each sort run using the available memory grant will be sorted. A “sort sort” represents a best effort given the resource available. This property can be used to infer that a Sort is implemented with CQScanPartitionSortNew without attaching a debugger. The meaning of the InMemory property flag shown above will be covered in part 2. It does not indicate whether a regular sort was performed in memory or not.

Well, with that attitude, it’s not surprising that there are so many overlapping buckets in the column store index. If it’s not good enough, what can you do?

Building the index with the Soft Sort here also leads to things being as bad as they were in Brent’s post.

Insert Debugging Here


Alas, there’s (almost) always a way. Microsoft keeps making these trace flag things.

There are a bunch of different ways to track them down, but figuring out the behavior of random trace flags that you may find just by enabling them isn’t easy.

One way to tie a trace flag to a behavior is to use WinDbg to step through different behaviors in action, and see if SQL Server checks to see if a trace flag is enabled when that behavior is performed.

If you catch that, you can be reasonably sure that the trace flag will have some impact on the behavior. Not all trace flags can be enabled at runtime. Some need to be enabled as startup options.

Sometimes it’s hours and hours of work to track this stuff down, and other times Paul White (b|t) already has notes on helpful ones.

The trace flag below, 2417, is present going back to SQL Server 2014, and can help with the Soft Sort issues we’re seeing when building ordered clustered column store indexes today.

Here’s another one:

DBCC TRACEON(8666, 2417);
CREATE CLUSTERED COLUMNSTORE INDEX
    vcci
ON dbo.Votes_CCI
ORDER (Postid)
WITH(MAXDOP = 1);
DBCC TRACEOFF(8666, 2417);

The MAXDOP 1 hint isn’t strictly necessary. With a parallel plan, you may see up to DOP overlapping row groups.

SQL Server Query Plan
community service

That’s why it was a popular maneuver to emulate this behavior by creating a clustered row store index, and then create a clustered column store index over it with drop existing and a MAXDOP 1 hint.

At DOP 1, you don’t see that overlap. It takes a lot longer of course — 3 minutes instead of 30 or so seconds — which is a real bummer. But without it, you could see DOP over lapping rowgroups.

If you want All The Pretty Little Rowgroups, this is what you have to do.

Anyway, the result using sp_BlitzIndex looks a lot better now:

EXEC sp_BlitzIndex
    @TableName = 'Votes_CCI';
SQL Server Query Results
capture the flag

How nice.

You can also use undocumented and unsupported trace flag 11621, which is

[A] feature flag for the ‘partition sort on column store order’ so the end result is similar, but via a different mechanism to 2417.
A partition sort is useful in general to prevent unnecessary switching between partitions. If you sort the stream by partition, you process all the rows for one before moving on to the next. A soft sort is ok there because it’s just a performance optimization. Worst case, you end up switching between partitions quite often because the sort ran out of memory, but correct results will still occur.

Chain Gang


A “reasonable” alternative to trace flags maybe to adjust the index create memory configuration option. If we set it down to the minimum value, we get a “helpful” error message:

EXEC sys.sp_configure 
    'index create memory', 
    704;

RECONFIGURE;

As promised:

Msg 8606, Level 17, State 1, Line 31

This index operation requires 123208 KB of memory per DOP.

The total requirement of 985800 KB for DOP of 8 is greater than the sp_configure value of 704 KB set for the advanced server configuration option “index create memory (KB)”.

Increase this setting or reduce DOP and rerun the query.

If you get the actual execution plan for the clustered column store index create or rebuild with the Soft Sort disabled and look at the memory grant, you get a reasonable estimate for what to set index create memory to.

Changing it does two things:

  • Avoids the very low memory grant that Soft Sorts receive, and causes the uneven row groups
  • The Soft Sort keeps the index create from going above that index create memory number

Setting index create memory for this particular index creation/rebuild to 5,561,824 gets you the nice, even row groups (at MAXDOP 1) that we saw when disabling the Soft Sort entirely.

Bottom line, here is that uneven row groups happen with column store indexes when there’s a:

  • Parallel create/rebuild
  • Low memory grant create/rebuild

If this sort of thing is particularly important to you, you could adjust index create memory to a value that allows the Soft Sort adequate memory.

But that’s a hell of a lot of work, and I hope Microsoft just fixes this in a later build.

Reality Bites


The cute thing here is that, while this syntactical functionality has been available in Azure Cloud Nonsense© for some time, no one uses that, so no one cares.

The bits for this were technically available in SQL Server 2019 as well, but I’m not telling you how to do that. It’s not supported, and bad things might happen if you use it.

I mean, bad things happen in SQL Server 2022 where it’s supported unless you use an undocumented trace flag, but… Uh. I dunno.

This trace flag seems to set things back to how things worked in the Before Times, though, which is probably how they should have stayed.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Ordered Columnstore Indexes on SQL Server 2022 CTP 2.0

Brent recently blogged about ordered columnstore indexes in SQL Server 2022 and had some trouble with them, so I decided to take a look into the mechanics of the feature. I’m testing on SQL Server 2022 CTP 2.0.

What does the ordered columnstore feature do?

  1. A sort operator may be added to query plans that insert into the table. The sort operator is a bit unusual in that the data may not be fully sorted.
  2. A sort operator is added when initially creating an ordered columnstore index. The level of rowgroup elimination fragmentation will depend on memory, DOP, and other factors.
  3. A sort operator is added when rebuilding an ordered columnstore index. The level of rowgroup elimination fragmentation will depend on memory, DOP, and other factors.

Technical details for insert

The sort for inserting into an ordered columnstore is a DML request sort. It appears to use the same internal mechanism as the sort that’s added for inserting into partitioned columnstore tables. The difference is that the data is sorted by the specified columns instead of a calculated partition id. In my testing, the sort appears to be a best effort sort that does not spill to tempdb. This means that if SQL Server thinks there won’t be enough memory then the data will not be fully sorted. Parallel inserts have an additional complication. Consider the following query plan image:

That is a row mode sort. It is a row mode sort because a batch mode parallel sort would put all resulting rows on a single thread which would make the parallel insert pointless. However, there’s no repartition streams operator as a child of the sort. Data is sorted on each thread in a best effort fashion. Even if there is enough memory to fully sort the data, you will end up with DOP threads of sorted data instead. The data will not be sorted globally. The split into threads will increase rowgroup elimination fragmentation.

As mentioned earlier, the sort operator does not always appear. It is not present when the cardinality estimate is very low (around 250 rows). I suspect that the same logic is used for adding the sort as adding the memory grant for compression. For very low cardinality estimates, the data will be inserted into delta rowgroups, even if there’s more than 102399 rows. By that same reasoning, I expect that there is no sort operator if the INSERT query hits a memory grant timeout.

For more information on this sort, see CQScanPartitionSortNew in Paul White’s blog post about different sort types in SQL Server.

Technical details for CREATE/REBUILD index

I spent less time looking into the sort that’s added as part of CREATE or REBUILD index. In my testing, the sort again does not spill to tempdb. The sort is also performed on a per thread basis for parallel index operations. The minimum fragmentation level will be achieved for a MAXDOP 1 operation with sufficient memory. Reducing memory or running the create index in parallel will increase fragmentation. This is unfortunate because ordered columnstore indexes do not support online index creation or rebuilds.

An unpopular opinion

I think that the community worries too much with columnstore with respect to achieving perfect segment ordering and keeping rowgroups at exactly 1048576 rows. If you perform basic maintenance and partition tables appropriately then that should be good enough for most data warehouse workloads. Most query performance issues are going to be caused by getting no elimination at all, scanning through too many soft-deleted rows, or the usual query performance problems. Scanning 11 rowgroups instead of 5 probably isn’t why your queries on columnstore indexes are slow today.

Final thoughts

As is, this feature can be described as a poor man’s partitioning. The sweet spot for this functionality feels very limited to me in its current state, but we’re still on CTP 2.0. Maybe there will be changes before RTM. Thanks for reading!

What Does And Doesn’t Belong In A SQL Server Data Warehouse

No Way, No How


This is a list of things I see in data warehouses that make me physically ill:

  • Unique constraints of any kind: Primary Keys, Indexes, etc. Make things unique during your staging process. Don’t make your indexes do that work.
  • Foreign Keys: Referential integrity should be guaranteed from your data source. If it can’t be, there’s no sense in making it happen in your data warehouse. Foreign Keys in SQL Server suck anyway, and slow the hell out of large data loads.
  • Clustered row store indexes: At this point in time, when you need a clustered index, it ought to be a clustered column store index.
  • Tables with “lots” of nonclustered row store indexes: They’ll only slow down your load times a whole bunch. Replace them with nonclustered column store indexes.
  • Indexed views: This isn’t 2012 anymore. Column store, column store, column store.
  • Standard Edition: The CPU limit of 24 cores is probably fine, but the buffer pool cap of 128GB and strict limitations on column store/batch mode are horrendous.

I know what you’re thinking looking at this list: I can drop and re-create things like unique constraints, foreign keys, and nonclustered indexes. You sure can, but you’re wasting a ton of time.

Data warehouses have a completely different set of needs from transaction systems. The sooner you stop treating data warehouses like AdventureWorks, the better.

That’s all.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Blocking on Columnstore Indexes that RCSI and NOLOCK Don’t Resolve

I recently ran into a production issue where a SELECT query that referenced a NOLOCK-hinted table was hitting a 30 second query timeout. Query store wait stats suggested that the issue was blocking on a table with a nonclustered columnstore index (NCCI). This was quite unexpected to me and I was eventually able to produce a reproduction of the issue. I believe this to be a bug in SQL Server that’s present in both RTM and the current CU as of this blog post (CU14). The issue also impacts CCIs as well but I did significantly less testing with that index type.

The Setup

First I’ll create a heap with 500k rows. All of the rows have the same value. I’ll also create an NCCI on the single column of the table.

DROP TABLE IF EXISTS dbo.TEST_NCCI_1;
CREATE TABLE dbo.TEST_NCCI_1 (
ID VARCHAR(10) NOT NULL
);

INSERT INTO dbo.TEST_NCCI_1
SELECT TOP (500000) '1'
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;

CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI ON dbo.TEST_NCCI_1 (ID) WITH (MAXDOP = 1);

The NCCI has a single compressed rowgroup with no rows in the delta store or delete buffer. It’s an honest table and I’m not trying to trick you:

We also need a table to join to. A heap with a single row will serve that need:

DROP TABLE IF EXISTS dbo.JOIN_TO_ME;
CREATE TABLE dbo.JOIN_TO_ME (
ID VARCHAR(10) NOT NULL
);

INSERT INTO dbo.JOIN_TO_ME VALUES ('1');

Suppose a different session has a long held exclusive table lock on TEST_NCCI_1. For example, the code below could be running in a different session with a yet to be committed transaction:

BEGIN TRANSACTION;

SELECT TOP (1) *
FROM dbo.TEST_NCCI_1 WITH (TABLOCKX);

Would you expect the following query to be blocked by the open transaction?

SELECT COUNT_BIG(*)
FROM dbo.TEST_NCCI_1 a WITH (NOLOCK)
Left Outer Join dbo.JOIN_TO_ME b WITH (NOLOCK) ON a.ID = b.ID;

Test Results

I ran 45 tests in total by varying the isolation level (SET TRANSACTION ISOLATION LEVEL) and the locking hint at the table level. You may be wondering why I bothered to do so. According to the documentation, a table level locking hint overrides the isolation level for read operations:

Only one of the isolation level options can be set at a time, and it remains set for that connection until it is explicitly changed. All read operations performed within the transaction operate under the rules for the specified isolation level unless a table hint in the FROM clause of a statement specifies different locking or versioning behavior for a table.

This is not what I observed which is part of why I believe that the current behavior in SQL Server should be considered a bug. In the table below, “YES” means that the query was blocked and “NO” means the query was able to execute. I used red font for the behavior which I believe to be in error based on my understanding of expected locking behavior in SQL Server.

To summarize the results:

  • The query is always blocked under the read committed, repeatable read, and serializable isolation levels. The table locking hint does not matter.
  • Blocking behavior works as expected under the read uncommitted and snapshot isolation levels.
  • RCSI does not help here. With RCSI, to avoid getting blocked you need to set the transaction isolation level to READ UNCOMMITTED.

I was able to use the lock_acquired extended event to see the problematic requested lock. In the screenshot below, the results are filtered to the object level. The first set of locks that occurred around the 11 second mark were under the read uncommitted isolation level. The second set of locks at around the 21 second mark where under the read committed isolation level. A table level NOLOCK hint was included for both queries.

To be clear, it is unexpected to see an IS object lock for a SELECT query with a table level NOLOCK hint. As far as I can tell, there are no interesting locks taken after the IS lock is acquired:

I attempted to investigate further by getting callstacks before the IS lock is acquired:

sqlmin.dll!XeSqlPkg::lock_acquired::Publish+0x228
sqlmin.dll!lck_lockInternal+0x1139
sqlmin.dll!LockAndCheckState+0x2c5
sqlmin.dll!GetHoBtLockInternal+0x445
sqlmin.dll!ColumnDataSetSession::WakeUp+0xa86
sqlmin.dll!NormalColumnDataSet::WakeUp+0xe
sqlmin.dll!ColumnDataSetSession::Create+0x183
sqlmin.dll!ColumnsetSS::WakeUpInternal+0x1ec
sqlmin.dll!ColumnsetSS::WakeUp+0x15e
sqlmin.dll!CreateDictionaryRowsetHelper+0x250
sqlmin.dll!CBpLocalDictionaryManager::GetSEStringDictionaryRowset+0x416
sqlmin.dll!CBpLocalDictionaryManager::FGetData+0xd1
sqlmin.dll!CBpDeepDataContextForBatch::GetStringValue+0x1d
sqlTsEs.dll!CTEsHashMultiData<167,1>::MdEsIntrinFn<CMDPureInput,CMDIteratorAllPureInputs>+0x13d
sqlTsEs.dll!CEsMDIntrinsicWrapper::UnaryImpl<&CTEsHashMultiData<167,1>::MdEsIntrinFn<CMDPureInput,CMDIteratorAllPureInputs>,&CTEsHashMultiData<167,1>::MdEsIntrinFn<CMDImpureInput,CMDIterator> >+0xd1
sqlTsEs.dll!CEsExecMultiData::GeneralEval+0x188
sqlTsEs.dll!CMultiDataEsRuntime::Eval+0x247
sqlmin.dll!CBpComputeMulticolumnHashRunTime::Eval+0x5b7
sqlmin.dll!CBpPartialJoin::ProcessProbeSide+0x445
sqlmin.dll!CBpQScanHashJoin::Main+0x148
sqlmin.dll!CBpQScanHashJoin::BpGetNextBatch+0x28
sqlmin.dll!CBpQScan::GetNextBatch+0x6f
sqlmin.dll!CBpChildIteratorScanner::BpGetNextBatch+0x12
sqlmin.dll!CBpQScanHashAggNew::ProcessInput+0x8b

I’m not terribly surprised to see a reference to a dictionary in the callstacks because I’m not able to reproduce the issue with an INT or BIGINT column. However, I have no idea where to go from here.

Workarounds

Adding an IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX query hint was an effective workaround for all of my test cases. I experienced the expected locking behavior for all test cases with that hint, so it is apparent to me that the issue is caused by the query plan reading from the NCCI.

Disabling batch mode using the DISALLOW_BATCH_MODE use hint seems to be an effective workaround as well, but I did not test this as thoroughly.

Changing the isolation level may also be an acceptable way to avoid the unexpected blocking for some scenarios. For example, if you already have a NOLOCK table hint in place then changing the transaction isolation level to read uncommitted may not introduce any new data correctness issues.

Final Thoughts

For some queries, a simple join on a VARCHAR column between a table with a columnstore index and another table can lead to unexpected blocking, even with a table level NOLOCK hint or with read committed snapshot isolation enabled for the database. This is frustrating and disappointing behavior from a concurrency perspective. It is an odd scenario where enabling snapshot isolation offers a significant benefit over enabling read committed snapshot isolation. I hope that this locking behavior changes in a future version of SQL Server. Thanks for reading!

Signs You Need Batch Mode To Make Your SQL Server Queries Faster

My Name Is


I speak with a lot of DBAs and developers who have either heard nothing about column store and batch mode, or they’ve only heard the bare minimum and aren’t sure where it can help them.

Here’s a short list of reasons I usually talk through with people.

Your Reporting Queries Hit Big Tables

The bigger your tables get, the more likely you are to benefit, especially if the queries are unpredictable in nature. If you let people write their own, or design their own reports, nonclustered column store can be a good replacement for nonclustered row store indexes that were created specifically for reporting queries.

In row store indexes, index key column order matters quite a bit. That’s not so much the case with column store. That makes them an ideal data source for queries, since they can scan and select from column independently.

Your Predicates Aren’t Always Very Selective

Picture the opposite of OLTP. Picture queries that collect large quantities of data and (usually) aggregate it down. Those are the ones that get along well with column store indexes and batch mode.

If most of your queries grab and pass around a couple thousand rows, you’re not likely to see a lot of benefit, here. You wanna target the ones with the big arrows in query plans.

Your Main Waits Are I/O And CPU

If you have a bunch of waits on blocking or something, this isn’t going to be your solve.

When your main waits are CPU, it could indicate that queries are overall CPU-bound. Batch mode is useful here, because for those “big” queries, you’re passing millions of rows around and making SQL Server send each one to CPU registers. Under batch mode, you can send up to 900 at a time. Just not in Standard Edition.

When your main waits are on I/O — reading pages from disk specifically — column store can be useful because of the compression they offer. It’s easy to visualize reading more compact structures being faster, especially when you throw in segment and column elimination.

Your Query Plans Have Some Bad Choices In Them

SQL Server 2019 (Enterprise Edition) introduced Batch Mode On Row Store, which let the optimizer heuristically select queries for Batch Mode execution. With that, you get some cool unlocks that you used to have to trick the optimizer into before 2019, like adaptive joins, memory grant feedback, etc.

While those things don’t solve every single performance issue, they can certainly help by letting SQL Server be a little more flexible with plan choices and attributes.

The Optimizer Keeps Choosing Cockamamie Parallel Sort Merge Plans That Make No Sense And Force You To Use Hash Join Hints All The Time

🤦‍♂️

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Dealing With Wide Missing Index Requests In SQL Server

We’ve All Been There


You’re running a query that selects a lot of columns, and you get a missing index request.

For the sake of brevity, let’s say it’s a query like this:

SELECT
    p.*
FROM dbo.Posts AS p
WHERE p.ParentId = 0;

The missing index request I get for this query is about like so:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Posts] ([ParentId])
INCLUDE ([AcceptedAnswerId],[AnswerCount],[Body],[ClosedDate],[CommentCount],[CommunityOwnedDate],
[CreationDate],[FavoriteCount],[LastActivityDate],[LastEditDate],[LastEditorDisplayName],[LastEditorUserId],
[OwnerUserId],[PostTypeId],[Score],[Tags],[Title],[ViewCount])

But that’s laughable, because it’s essentially a shadow clustered index. It’s every column in the table ordered by <some column>.

And Again


Under many circumstances, you can trim all those included columns off and make sure there’s a usable index with ParentId as the leading column.

I’m not a fan of single key column indexes most of the time, so I’d avoid that practice.

But sure, if you have reasonably selective predicates, you’ll get a decent seek + lookup plan. That’s not always going to be the case, though, and for various reasons you may end up getting a poor-enough estimate on a reasonably selective predicate, which will result in a bad-enough plan.

Of course, other times you may not have very selective predicates at all. Take that query up there, for example. There are 17,142,169 rows in the Posts table (2013), and 6,050,820 of them qualify for our predicate on ParentId.

This isn’t a case where I’d go after a filtered index, either, because it’d only be useful for this one query. And it’d still be really wide.

There are four string columns in there, all nvarchar.

  • Title (250)
  • Tags (150)
  • LastEditorDisplayName(40)
  • Body(max)

Maybe Something Different


If I’m going to create an index like that, I want more out of it than I could get with the one that the optimizer asked for.

On a decently recent version of SQL Server (preferably Enterprise Edition), I’d probably opt for creating a nonclustered column store index here.

You get a lot of benefits from that, which you wouldn’t get from the row store index.

  • Column independence for searching
  • High compression ratio
  • Batch Mode execution

That means you can use the index for better searching on other predicates that aren’t terribly selective, the data source is smaller and less likely to be I/O bound, and batch mode is aces for queries that process a lot of rows.

Column store indexes still have some weird limitations and restrictions. Especially around data types and included columns, I don’t quite understand why there isn’t better parity between clustered and nonclustered column store.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

I/O Bound vs CPU Bound Queries In SQL Server

Handcuffed


When you’re looking for queries to tune, it’s important to understand which part is causing the slowdown.

That’s why Actual Execution plans are so valuable in newer versions of SQL Server and SSMS. Getting to see operator timing and wait stats for a query can tell you a lot about what kind of problem you’re facing.

Let’s take a look at some examples.

Diskord


If you’re looking at a query plan, and all the time is spent way to the right, when you’re reading from indexes, it’s usually a sign of one or two things:

  • You’re missing a good index for the query
  • You don’t have enough memory to cache data relevant to the query

If you run the query a second time, and it has the same characteristics — meaning we should now have the data cached in the buffer pool but we don’t — then one or both of those two things is true.

If you run the query a second time and it’s fast because all the data we care about is cached, then it’s more likely that only the second thing is true.

SQL Server Query Plan
wasabi

For example, every time I run this query it takes 20 seconds because every time it has to read the clustered index from disk into memory to get a count. That’s because my VM has 96 GB of RAM, and the clustered index of the Posts table is about 120 GB. I can’t fit the whole thing into the buffer pool, so each time I run this query has the gas station sushi effect on the buffer pool.

If I add a nonclustered index — and keep in mind I don’t really condone adding single key column nonclustered indexes like this — the query finishes much faster, because the smaller nonclustered index takes less time to read, and it fits into the buffer pool.

CREATE INDEX pr ON dbo.Posts
(
    Id
);
SQL Server Query Plan
birthday

If our query had different characteristics, like a where clause, join, group by, order by, or windowing function, I’d consider all of those things for the index definition. Just grabbing a count can still benefit from a smaller index, but there’s nothing relational that we need to account for here.

Proc Rock


Let’s say you already have ideal indexes for a query, but it’s still slow. Then what?

There are lots of possible reasons, but we’re going to examine what a CPU bound query looks like. A good example is one that needs to process a lot of rows, though not necessarily return a lot of rows, like a count or other aggregate.

SQL Server Query Plan
splish splash

While this query runs, CPUs are pegged like suburban husbands.

SQL Server Query Plan
in the middle of the street

For queries of this stature, inducing batch mode is often the most logical choice. Why? Because CPU instructions are run over batches of rows at once, rather than a single row at a time.

With a small number of rows — like in an OLTP workload — you probably won’t notice any real gains. But for this query that takes many millions of rows and produces an aggregate, it’s Hammer Time™

SQL Server Query Plan
known as such

Rather than ~30 seconds, we can get our query down to ~8 seconds without making a single other change to the indexes or written form.

Under Compression


For truly large data sets, compression indexes is a great choice for further reducing I/O bound portions of queries. In SQL Server, you have row, page, and column store (clustered and nonclustered) compression available to you based on the type of workload you’re running.

When you’re tuning a query, it’s important to keep the type of bottleneck you’re facing in mind. If you don’t, you can end up trying to solve the wrong problem and getting nowhere.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

A Silent Bug With Clustered Column Store Indexes And Computed Columns In SQL Server 2019

Oh Boy


This bug has been reported to the proper authorities, and this blog post is not to dunk on the Summer Intern responsible for column store indexes, but to warn you to not do this, because I don’t know what it will break under the covers.

If you read the documentation for column store indexes, it says that column store indexes can’t be created on persisted computed columns.

And that’s true. If we step through this script, creating the column store index will fail.

/*Bye Felicia*/
DROP TABLE IF EXISTS dbo.cci;
GO 

/*Hi Felicia*/
CREATE TABLE dbo.cci(id int, fk int, sd date, ed datetime);
GO 

/*COMPUTER!*/
ALTER TABLE dbo.cci ADD cc AS id * 2;
ALTER TABLE dbo.cci ALTER COLUMN cc ADD PERSISTED;
GO 

/*Rats*/
CREATE CLUSTERED COLUMNSTORE INDEX c ON dbo.cci;
GO

The error message we get is pretty clear:

Msg 35307, Level 16, State 1, Line 76
The statement failed because column 'cc' on table 'cci' is a computed column. 
Columnstore index cannot include a computed column implicitly or explicitly.

80 Blocks from Tiffany’s


However, if we change the order of things a little bit, we’ll find that we can add a persisted computed column to a table with a clustered column store index on it.

/*Bye Felicia*/
DROP TABLE IF EXISTS dbo.cci;
GO 

/*Hi Felicia*/
CREATE TABLE dbo.cci(id int, fk int, sd date, ed datetime, INDEX c CLUSTERED COLUMNSTORE);
GO 

/*???*/
ALTER TABLE dbo.cci ADD cc AS id * 2;
ALTER TABLE dbo.cci ALTER COLUMN cc ADD PERSISTED;
GO 

/*With data?*/
INSERT dbo.cci
(
    id, fk, sd, ed
)
VALUES
(
    10, 10, GETDATE(), GETDATE()  
);

/*yepyepyepyepyepyep*/
SELECT
    c.*
FROM dbo.cci AS c

/*Eh?*/
ALTER INDEX c ON dbo.cci REBUILD;
ALTER INDEX c ON dbo.cci REORGANIZE;
DBCC CHECKTABLE('dbo.cci');

And everything seems to work. However, if we drop the column store index, it can’t be re-created.

The key here is that the computed column is added, and then the persisted attribute is added second. That is where an error should be thrown.

But what do I know? I’m just a bouncer.

Bomber


Again, I’m telling you not to do this. I’m telling you that it might break things in gruesome and horrible ways.

I don’t think that this is a preview of upcoming support for persisted computed columns in clustered column store indexes.

Thanks for reading!

p.s. Really, don’t do this

p.p.s. If you do this you’re asking for trouble

p.p.p.s. This was fixed in CU12

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How Useful Is Column Store Indexing In SQL Server Standard Edition?

Speed Limit


When I’m blogging about performance tuning, most of it is from the perspective of Enterprise Edition. That’s where you need to be if you’re serious about getting SQL Server to go as fast as possible. Between the unrealistic memory limits and other feature restrictions, Standard Edition just doesn’t hold up.

Sure, you can probably get by with it for a while, but once performance becomes a primary concern it’s time to fork over an additional 5k a core for the big boat.

They don’t call it Standard Edition because it’s The Standard, like the hotel. Standard is a funny word like that. It can denote either high or low standing through clever placement of “the”.  Let’s try an experiment:

  • Erik’s blogging is standard for technical writing
  • Erik’s blogging is the standard for technical writing

Now you see where you stand with standard edition. Not with “the”, that’s for sure. “The” has left the building.

Nerd Juice


A lot of the restrictions for column store in Standard Edition are documented, but:

  • DOP limit of two for queries
  • No parallelism for creating or rebuilding indexes
  • No aggregate pushdown
  • No string predicate pushdown
  • No SIMD support

Here’s a comparison for creating a nonclustered column store index in Standard and Enterprise/Developer Editions:

SQL Server Query Plan
your fly is down

The top plan is from Standard Edition, and runs for a minute in a full serial plan. There is a non-parallel plan reason in the operator properties: MaxDOPSetToOne.

I do not have DOP set to one anywhere, that’s just the restriction kicking in. You can try it out for yourself if you have Standard Edition sitting around somewhere. I’m doing all my testing on SQL Server 2019 CU9. This is not ancient technology at the time of writing.

The bottom plan is from Enterprise/Developer Edition, where the the plan is able to run partially in parallel, and takes 28 seconds (about half the time as the serial plan).

Query Matters


One of my favorite query tuning tricks is getting batch mode to happen on queries that process a lot of rows. It doesn’t always help, but it’s almost always worth trying.

The problem is that on Standard Edition, if you’re processing a lot of rows, being limited to a DOP of 2 can be a real hobbler. In many practical cases, a batch mode query at DOP 2 will end up around the same as a row mode query at DOP 8. It’s pretty unfortunate.

In some cases, it can end up being much worse.

SELECT 
    MIN(p.Id) AS TinyId,
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p WITH(INDEX = ncp)
JOIN dbo.Votes AS v
    ON p.Id = v.PostId
WHERE p. OwnerUserId = 22656;

SELECT 
    MIN(p.Id) AS TinyId,
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p WITH(INDEX = 1)
JOIN dbo.Votes AS v
    ON p.Id = v.PostId
WHERE p. OwnerUserId = 22656;

Here’s the query plan for the first one, which uses the nonclustered column store index on Posts. There is no hint or setting that’s keeping DOP at 2, this really is just a feature restriction.

SQL Server Query Plan
drop it like it’s dop

Higher Ground


The second query, which is limited by the MAXDOP setting to 8, turns out much faster. The batch mode query takes 3.8 seconds, and the row mode query takes 1.4 seconds.

SQL Server Query Plan
it’s a new craze

In Enterprise Edition, there are other considerations for getting batch mode going, like memory grant feedback or adaptive joins, but those aren’t available in Standard Edition.

In a word, that sucks.

Dumb Limit


The restrictions on creating and rebuilding column store indexes to DOP 1 (both clustered and nonclustered), and queries to DOP 2 all seems even more odd when we consider that there is no restriction on inserting data into a table with a column store index on it.

As an example:

SELECT 
    p.*
INTO dbo.PostsTestLoad
FROM dbo.Posts AS p
WHERE 1 = 0;

CREATE CLUSTERED COLUMNSTORE INDEX pc ON dbo.PostsTestLoad;

SET IDENTITY_INSERT dbo.PostsTestLoad ON;

INSERT dbo.PostsTestLoad WITH(TABLOCK)
(
    Id, AcceptedAnswerId, AnswerCount, Body, ClosedDate, 
    CommentCount, CommunityOwnedDate, CreationDate, 
    FavoriteCount, LastActivityDate, LastEditDate, 
    LastEditorDisplayName, LastEditorUserId, OwnerUserId, 
    ParentId, PostTypeId, Score, Tags, Title, ViewCount 
)
SELECT TOP (1024 * 1024)
    p.Id, p.AcceptedAnswerId, p.AnswerCount, p.Body, p.ClosedDate, p.
    CommentCount, p.CommunityOwnedDate, p.CreationDate, p.
    FavoriteCount, p.LastActivityDate, p.LastEditDate, p.
    LastEditorDisplayName, p.LastEditorUserId, p.OwnerUserId, p.
    ParentId, p.PostTypeId, p.Score, p.Tags, p.Title, p.ViewCount 
FROM dbo.Posts AS p;

SET IDENTITY_INSERT dbo.PostsTestLoad OFF;
SQL Server Query Plan
smells like dop spirit

Unsupportive Parents


These limits are asinine, plain and simple, and I hope at some point they’re reconsidered. While I don’t expect everything from Standard Edition, because it is Basic Cable Edition, I do think that some of the restrictions go way too far.

Perhaps an edition somewhere between Standard and Enterprise would make sense. When you line the two up, the available features and pricing are incredibly stark choices.

There are often mixed needs as well, where some people need Standard Edition with fewer HA restrictions, and some people need it with fewer performance restrictions.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.