Even If SQL Server Table Variables Were Always In Memory, It Wouldn’t Make Them Better Than Temp Tables

Past Perfect


You know all those pesky developer myths that never go away?

  • CTEs and Views can’t use indexes: WRONG
  • Subqueries are slower than joins: WRONG
  • Temp tables are bad for performance: WRONG
  • Table Variables are always in memory: WRONG

Okay, that’s not a complete list, but a complete list would fill the internet to its very brim and cause all sorts of overflow errors.

My job as a performance tuning consultant is to teach folks when they’ve been lead astray.

To confuse things a bit further, Microsoft allows for in-memory @table variables, but you have do a lot of work to get set up to use them.

They’re not the ones you get out of the box.

Future Mediocre


Let’s say for the sake of argument that @table variables were always more “in-memory” than #temp tables (they’re not). They’d still have a lot of the same fundamental problems:

With all that stuff in mind: who cares if they’re only in memory? There are so many downsides for most query patterns, they could all happen in quantum space and performance would still stink.

I know, they have their uses:

But most of y’all aren’t using them that wisely.

And here’s the other thing: Most of y’all don’t have enough memory to begin with, and the memory you do have you misuse in a million ways. Why do you wanna make that worse?

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

How SQL Server 2019 Helps You Find Queries That Have Missing Index Requests

bigtime


In the quest for me trying to get people to upgrade to a not-old-and-busted version of SQL Server, this is one that I talk about a lot because it really helps folks who don’t have all the time in the world to tune queries and indexes.

Here’s a quick helper query to get you started:

SELECT TOP (50)
    query_text = 
        SUBSTRING
        (
            st.text,
            qs.statement_start_offset / 2 + 1,
            CASE qs.statement_start_offset
                 WHEN -1 
                 THEN DATALENGTH(st.text)
                 ELSE qs.statement_end_offset
            END - qs.statement_start_offset / 2 + 1
        ),
    qp.query_plan,
    qs.creation_time,
    qs.last_execution_time,
    qs.execution_count,
    qs.max_worker_time,
    avg_worker_time = 
        (qs.total_worker_time / qs.execution_count),
    qs.max_grant_kb,
    qs.max_used_grant_kb,
    qs.total_spills
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp
CROSS APPLY
(
    SELECT TOP (1)
        gqs.*
    FROM sys.dm_db_missing_index_group_stats_query AS gqs
    WHERE qs.query_hash = gqs.query_hash
    AND   qs.query_plan_hash = gqs.query_plan_hash
    AND   qs.sql_handle = gqs.last_sql_handle
    ORDER BY
        gqs.last_user_seek DESC,
        gqs.last_user_scan DESC
) AS gqs
ORDER BY qs.max_worker_time DESC
OPTION(RECOMPILE);

This should help you find queries that use a lot of CPU and might could oughtta use an index.

Note that this script does not assemble the missing index definition for you. That stuff is all readily available in the query plans that get returned here, and of course the missing index feature has many caveats and limitations to it.

You should, as often as possible, execute the query and collect the actual execution plan to see where the time is spent before adding anything in.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Unkillable Sessions and Undetected Deadlocks

I recently experienced a blocking issue in a production environment which had been going on for hours. As a responsible DBA, I tried to kill the head of the blocking chain. Unexpectedly, killing the session seemed to have no effect. The blocking continued and the session seemed to stick around.

How to Create an Unkillable Session

It’s surprisingly easy to create a session that seemingly can’t be killed. Use good judgment in picking an environment to run this code against. First you’ll need to enable the “Ad Hoc Distributed Queries” configuration option if you haven’t done so already:

sp_configure 'Ad Hoc Distributed Queries', 1;
RECONFIGURE;
GO
sp_configure;

You can then run the following code after replacing the SERVER_NAME and INSTANCE_NAME placeholder values with the server name and instance name that you’re running against. In other words, you want the OPENROWSET call to point at the same server that you’re executing the query against:

BEGIN TRANSACTION;

SELECT TOP (1) [name]
FROM master.dbo.spt_values WITH (TABLOCKX);

SELECT TOP (1) d.[name]
FROM OPENROWSET('SQLNCLI', '{{SERVER_NAME}}\{{INSTANCE_NAME}};Trusted_Connection=yes;', master.dbo.spt_values) AS d;

This code seems to run forever. Trying to kill the session is not effective, even though there’s no error message and a line written to the error log claiming that the session was killed. Running kill with the STATUSONLY option gives us the following:

SPID 56: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 0 seconds.

Trying to cancel the query using SSMS also seems to have no effect as well.

The Undetected Deadlock

Running sp_whoisactive reveals a pretty alarming state:

Session 56 (the one that I can see in SSMS) is waiting on an OLEDB wait. Based on the documentation, I assume that you’ll get this wait type while waiting for the OPENROWSET call to complete:

Includes all connection information that is required to access remote data from an OLE DB data source. This method is an alternative to accessing tables in a linked server and is a one-time, ad hoc method of connecting and accessing remote data by using OLE DB.

I did get callstacks for the OLEDB wait using the wait_info_external extended event, but there were two problems. The first problem is that wait_info_external fires after the wait is complete and the OLEDB wait won’t finish. The second problem is that the callstacks that I did get weren’t very interesting.

Getting back on topic, I believe that the OLEDB wait for session 56 can only end once session 66 completes its work. Session 66 is the session that was created by the OPENROWSET call. However, session 66 needs an IS lock on the spt_values table. That is not compatible with the exclusive lock held on that object by session 56. So session 56 is waiting on session 66 but session 66 is waiting on session 56. This is an undetected deadlock. The deadlock can be resolved by killing session 66 (the OPENROWSET query).

Note that if you are crazy enough to try this on your own, you may see a MEMORY_ALLOCATION_EXT wait instead of an OLEDB wait. I don’t believe that there’s any meaningful distinction there, but don’t assume that a long running OLEDB wait is required for this problem to occur.

Final Thoughts

This blocking issue was unusual in that killing the blocking session doesn’t resolve the issue. I had to kill the session getting blocked instead. Thanks for reading!

Why Some Types of Parallel Scans can be Slow on Cloud Storage

Upon reading the title, you may be thinking that of course parallel scans will be slow in the cloud. Cloud storage storage simply isn’t very fast. I would argue that there’s a bit more to it.

The Timeout

A query timed out the other day in production after running for 600 seconds. Of note, one of the tables used by the stored procedure is in a database with data files hosted by Azure blob storage. I grabbed the wait stats for the timed out query using query store:

According to the documentation, the latch category maps to LATCH_% wait types and the preemptive category maps to PREEMPTIVE_% wait types. I wasn’t able to reproduce the timeout when executing the same query text, even if I forced the exact same query plan with a USE PLAN. Admittedly, I was puzzled by the wait stats distribution for a while.

Decoding the Categories

I eventually realized that typically the first execution of the stored procedure for the business day tended to be the slowest. Sorting query store runtime DMV results by time:

I also noticed that the timed out executions tended to have higher physical I/O than the other executions. Sorting query store runtime DMV results by I/O:

I now finally had a method to reproduce the poor performance of the stored procedure: being the first one to run the code in the morning, presumably when the buffer pool didn’t have anything helpful for this stored procedure. Early in the morning, I was able to capture an actual plan that took 13 minutes at MAXDOP 8. Nearly all of the execution time is spent on an index scan for a table that is hosted on cloud storage:

Here are the top wait stats from the actual plan:

<Wait WaitType=”LATCH_EX” WaitTimeMs=”5423928″ WaitCount=”1971″ />
<Wait WaitType=”PAGEIOLATCH_SH” WaitTimeMs=”977288″ WaitCount=”29210″ />
<Wait WaitType=”CXPACKET” WaitTimeMs=”809678″ WaitCount=”700″ />
<Wait WaitType=”PREEMPTIVE_HTTP_REQUEST” WaitTimeMs=”43801″ WaitCount=”50344″ />

I used the task DMVs during query execution to get information about the latch type:

 

Now I know that the query spends most of its time trying to acquire exclusive latches of the ACCESS_METHODS_DATASET_PARENT type. There is also a fair amount of I/O wait time compared to the mysterious PREEMPTIVE_HTTP_REQUEST wait type.

The ACCESS_METHODS_DATASET_PARENT Latch

I asked the community if this latch was considered to be interesting. The winning option was “What?”. As usual, twitter is useless:

For my query, at times I observed seven out of eight worker threads all waiting for an exclusive latch at the same time. This isn’t surprising considering that the query averaged 6.7 LATCH_EX wait seconds per second. That the other thread was doing an I/O wait while the others were waiting for the latch, so perhaps the worker threads were doing I/Os while holding the latch resource. On a development environment (with unfortunately significantly better I/O), I mocked up a roughly similar table and enabled the latch_acquired, latch_released, file_read, and file_read_completed extended events. The results were quite educational:

There were two different patterns in the XE files. Sometimes the worker threads would acquire the latch, perform one or more I/Os, and release the latch. Otherwise they would acquire the latch and quickly release it without performing any I/Os. Paul Randal recently wrote the following about this latch:

When either a heap or an index is being accessed, internally there’s an object called a HeapDataSetSession or IndexDataSetSession, respectively. When a parallel scan is being performed, the threads doing the actual work of the scan each have a “child” dataset (another instance of the two objects I just described), and the main dataset, which is really controlling the scan, is called the “parent.”

When one of the scan worker threads has exhausted the set of rows it’s supposed to scan, it needs to get a new range by accessing the parent dataset, which means acquiring the ACCESS_METHODS_DATASET_PARENT latch in exclusive mode. While this can seem like a bottleneck, it’s not really, and there’s nothing you can do to stop the threads performing a parallel scan from occasionally showing a LATCH_EX wait for this latch.

At a very high level, I suspect that the fast latch releases (marked in blue in the above picture) occur when the worker thread can acquire an already available range of rows from the parent object. The slow latch releases (marked in red) occur when the worker thread tries to acquire a new range, can’t find one, and has to perform I/Os to add more ranges to the parent object. In both cases the parent object needs to be modified, so an exclusive latch is always taken. I don’t know how it actually works behind the scenes, but the theory matches the observed behavior.

In summary, the query does a parallel scan which is protected by the ACCESS_METHODS_DATASET_PARENT latch. The table getting scanned is hosted on cloud storage with high I/O latency. Latency being high contributes to the exclusive latch on ACCESS_METHODS_DATASET_PARENT getting held for a long time which can block all of the other parallel worker threads from proceeding.

The PREEMPTIVE_HTTP_REQUEST Wait Type

I already have enough information to be confident in a query tuning fix for the stored procedure, but for completeness, I also investigated the PREEMPTIVE_HTTP_REQUEST wait type. I was not able to find any useful documentation for this wait type. The official documentation only says “Internal use only.” Thanks, Microsoft.

A common technique to learn more about undocumented waits is to get callstacks associated with the wait type. I’ve seen the PREEMPTIVE_HTTP_REQUEST wait occur in Azure SQL databases, Azure SQL Managed Instances, and for databases hosted in Azure blob storage. It is possible to get callstacks for managed instances, but it is not possible to translate them because Microsoft does not release public symbols for managed instances. However, the blob storage scenario did allow me to get and translate call stacks for the wait. Below are a few of the stacks that I captured:

Performing I/Os to read data not in the buffer pool:

sqldk!XeSosPkg::wait_info_external::Publish
sqldk!SOS_Task::PushWait
sqlmin!SOS_Task::AutoSwitchPreemptive::AutoSwitchPreemptive
sqlmin!BlobAccess::AsyncHttpIOInternal
sqlmin!BlobAccess::AsyncIOInternal
sqlmin!XFCB2::ScatterReadInternal
sqlmin!FCB::ScatterRead
sqlmin!RecoveryUnit::ScatterRead
sqlmin!BPool::GetFromDisk
sqlmin!BPool::Get
sqlmin!BlobBase::FixNode
sqlmin!BlobManager::LocateBlob
sqlmin!BlobManager::OpenInternal
sqlmin!BlobManager::Open
sqlmin!LockBytesSS::Open
sqlmin!OpenLockBytesNew
sqlmin!OpenIlbssWithSeLobHandleRO
sqlmin!IndexDataSetSession::OpenStreamWithSeLobHandle
sqlmin!RowsetNewSS::OpenWithSeLobHandle
sqlmin!CBlobHandleBaseTableFactory::PilbGet
sqlTsEs!CompareLegacyLobNull
sqlTsEs!CTEsCompare<35,35>::BlCompareXcArgArgImpl
sqlTsEs!CEsExec::GeneralEval4
sqlmin!CQScanStreamAggregateNew::GetRowHelper

Writing LOB data to disk:

sqldk!XeSosPkg::wait_info_external::Publish
sqldk!SOS_Task::PushWait
sqlmin!SOS_Task::AutoSwitchPreemptive::AutoSwitchPreemptive
sqlmin!BlobAccess::AsyncHttpIOInternal
sqlmin!BlobAccess::AsyncIOInternal
sqlmin!XFCB2::ScatterReadInternal
sqlmin!FCB::ScatterRead
sqlmin!RecoveryUnit::ScatterRead
sqlmin!BPool::GetFromDisk
sqlmin!BPool::Get
sqlmin!BlobBase::FixNode
sqlmin!Blob::RWBlob
sqlmin!Blob::RWBlob
sqlmin!Blob::ReadWriteAt
sqlmin!BlobManager::ReadAtInternal
sqlmin!BlobManager::ReadAt
sqlmin!LockBytesSS::ReadAt
sqllang!write_LOB_chunked
sqllang!`anonymous namespace’::TSendRowClassNoCount<7,0>::TSendUnlimitedVartype<0>
sqllang!CTds74::SendRowImpl
sqlTsEs!CEsExec::GeneralEval
sqllang!CXStmtQuery::ErsqExecuteQuery
sqllang!CXStmtSelect::XretExecute
sqllang!CMsqlExecContext::ExecuteStmts<1,1>

Transaction log writes:

sqldk!XeSosPkg::wait_info_external::Publish
sqldk!SOS_Scheduler::UpdateWaitTimeStats
sqldk!SOS_Task::PopWait
sqlmin!SOS_ExternalAutoWait::~SOS_ExternalAutoWait
sqlmin!SOS_Task::AutoSwitchPreemptive::~AutoSwitchPreemptive
sqlmin!BlobAccess::AsyncHttpIOInternal
sqlmin!BlobAccess::AsyncIOInternal
sqlmin!XFCB2::AsyncWriteInternal
sqlmin!FCB::AsyncWrite
sqlmin!SQLServerLogMgr::FlushLCOld
sqlmin!SQLServerLogMgr::LogWriter
sqldk!SOS_Task::Param::Execute
sqldk!SOS_Scheduler::RunTask
sqldk!SOS_Scheduler::ProcessTasks
sqldk!SchedulerManager::WorkerEntryPoint
sqldk!SystemThreadDispatcher::ProcessWorker
sqldk!SchedulerManager::ThreadEntryPoint
KERNEL32.DLL+0x0000000000017974
ntdll.dll+0x000000000006A271

I’m guessing this one is renewing the lease as noted in the documentation: “the implementation of SQL Server Data Files in Azure Storage feature requires a renewal of blob lease every 45 to 60 seconds implicitly”

sqldk!XeSosPkg::wait_info_external::Publish
sqldk!SOS_Task::PushWait
sqlmin!SOS_Task::AutoSwitchPreemptive::AutoSwitchPreemptive
sqlmin!BlobAccess::AsyncHttpIOInternal
sqlmin!BlobAccess::AsyncIOInternal
sqlmin!LeaseRenewalManager2::RenewLeases
sqlmin!XStoreLeaseRenewalTask2
sqldk!SOS_Task::Param::Execute
sqldk!SOS_Scheduler::RunTask
sqldk!SOS_Scheduler::ProcessTasks
sqldk!SchedulerManager::WorkerEntryPoint
sqldk!SystemThreadDispatcher::ProcessWorker
sqldk!SchedulerManager::ThreadEntryPoint

Not surprisingly, all of them have to do with accessing azure storage. I was able to observe an interesting pattern during a table scan: every PAGEIOLATCH_SH wait was immediately preceded by a PREEMPTIVE_HTTP_REQUEST wait.

Getting stacks for the waits that are close together, with shared code removed and the I/O wait on top with the preemptive wait on the bottom:

sqldk!XeSosPkg::wait_info::Publish
sqldk!SOS_Task::PreWait
sqldk!WaitableBase::Wait
sqlmin!LatchBase::Suspend
sqlmin!LatchBase::AcquireInternal
sqlmin!BUF::AcquireLatch
sqlmin!BPool::Get
<<SHARED STACKS REMOVED>>

sqldk!XeSosPkg::wait_info_external::Publish
sqldk!SOS_Task::PushWait
sqlmin!SOS_Task::AutoSwitchPreemptive::AutoSwitchPreemptive
sqlmin!BlobAccess::AsyncHttpIOInternal
sqlmin!BlobAccess::AsyncIOInternal
sqlmin!XFCB2::ScatterReadInternal
sqlmin!FCB::ScatterRead
sqlmin!RecoveryUnit::ScatterRead
sqlmin!BPool::GetFromDisk
sqlmin!BPool::Get
sqlmin!BlobBase::FixNode
sqlmin!BlobManager::LocateBlob
sqlmin!BlobManager::OpenInternal
sqlmin!BlobManager::Open
sqlmin!LockBytesSS::Open
sqlmin!OpenLockBytesNew
sqlmin!OpenIlbssWithSeLobHandleRO
sqlmin!IndexDataSetSession::OpenStreamWithSeLobHandle
sqlmin!RowsetNewSS::OpenWithSeLobHandle
sqlmin!CBlobHandleBaseTableFactory::PilbGet
sqlTsEs!CompareLegacyLobNull

My understanding is that SQL Server makes an HTTP request to the blob storage in preemptive mode in order to queue an asynchronous I/O. The preemptive wait ends and the PAGEIOLATCH_SH wait begins shortly after. The traditional I/O wait ends when the I/O is received by SQL Server. That’s just a guess though.

I think a reasonable summary is that you should expect to see PREEMPTIVE_HTTP_REQUEST waits if you’re performing I/O against Azure storage. They are a normal part of SQL Server’s operations.

Reducing ACCESS_METHODS_DATASET_PARENT Latch Wait Time

In some situations, a query that is slow due to I/O performance can run faster with a higher DOP. Instead of one thread waiting for I/O you can have many threads waiting and overall runtime will decrease. Of course, this won’t work in all scenarios, such as if you’re hitting an IOPs limit. For my scenario, the DOP 8 query had 6.7 LATCH_EX wait seconds per second of runtime. Latch contention is so bad that it nearly steals all of the theoretical benefits achieved with query parallelism.

The obvious solution is to reduce the amount of I/O that is done by the query. That is what was indeed done to resolve the production issue. Another obvious solution is to improve I/O performance. More on that in the next section. However, in some cases the query may simply need to scan a lot of data and you can’t do anything about storage performance. In that situation, it may sometimes be practical to change the type of parallel scan performed by SQL Server.

Consider the following parallel plan against a partitioned table on the probe side:

For clarity, here is what the T-SQL looks like:

SELECT SUM(Case WHEN OffPageFiller IS NOT NULL THEN 1 ELSE 0 END)
FROM (
      VALUES (1),(2),(3),(4),(5),(6)
) v(v)
CROSS APPLY(
       SELECT OffPageFiller
       FROM dbo.SmallTable s
       INNER JOIN dbo.BigTablePart b ON s.JoinId = b.JoinId
       WHERE b.PartId = v.v
) q;

Each thread of the parallel nested loop operator gets assigned an entire partition and reads all of the data for that partition. There is no longer a need for the ACCESS_METHODS_DATASET_PARENT latch because the shared state between worker threads has been removed. I checked using extended events and found that the ACCESS_METHODS_DATASET_PARENT latch wait count was significantly decreased as expected. I suspect that the remaining latch acquisitions were from the hash build side scan. This type of access pattern requires a partitioned table, can be a pain to express in T-SQL, and may not work well if partitions are unevenly sized or if there aren’t enough partitions compared to DOP to get good demand-based distribution.

Perhaps a more reasonable option is to switch to columnstore instead. The query will do less I/O and I assume that the latch pattern is quite different, but I didn’t test this. Testing things in the cloud costs money. Maybe I should start a Patreon.

Reducing PREEMPTIVE_HTTP_REQUEST Wait Time

As I said earlier, I believe that seeing this wait is normal if you’re using azure storage. In my limited experience with it, any noticeable wait time was always tied to even longer PAGEIOLATCH waits. Tuning the query using traditional methods to reduce I/O wait time was always sufficient and I never had a need to specifically focus on PREEMPTIVE_HTTP_REQUEST. However, if you really need to focus on this wait type in particular for some reason, I’ll theorize some ways to reduce it:

  1. For all platforms, reduce I/O performed by the query using query tuning, schema changes, and so on.
  2. For managed instances, move your I/O from user databases to tempdb (it uses locally attached storage).
  3. For azure blob storage databases, make sure that you are using the right geographical region for your storage.

Here are a few more options which may not specifically reduce PREEMPTIVE_HTTP_REQUEST wait time but may improve I/O performance overall:

  1. For managed instances and azure blob storage databases, try increasing the size of your data files.
  2. For managed instances, try increasing the number of data files.
  3. For all platforms, switch to a more expensive tier of storage.

Final Thoughts

The query timed out in production because it performed a large parallel scan against a table that had little to no data in the buffer pool. PAGEIOLATCH waits were not the dominant wait type due to high latency from Azure blob storage. That high latency caused latch contention on the ACCESS_METHODS_DATASET_PARENT latch. My parallel scan was slow because cloud storage isn’t very fast. Maybe it isn’t so complicated after all?

Using Batch Mode for the “Matching Supply With Demand” Challenge

Itzik Ben-Gan posted an interesting T-SQL challenge on SQL performance dot com. I’m writing up my solution in my own blog post because I have a lot to say and getting code formatting right can be tricky in blog post comments. For reference, my test machine is using SQL Server 2019 CU14 with an i7-9700K CPU @ 3.60 GHz processor. The baseline cursor solution completes in 8465 ms on my machine.

Running Totals

A simple way to solve this problem is to calculate running totals of quantity separately for supply and demand, treat the resulting rows as intervals, and find the intersections between supply and demand intervals. It is fairly straightforward and fast to calculate the interval start and end points and load them into temp tables, so I’ll omit that code for now. We can find the intersecting intervals with the query below( which you should not run):

SELECT d.ID DemandId, s.ID as SupplyID,
CASE
WHEN d.IntervalEnd >= s.IntervalEnd THEN s.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
ELSE d.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
END TradeQuantity
FROM #demand_intervals d
CROSS JOIN #supply_intervals s
WHERE d.IntervalEnd > s.IntervalStart AND d.IntervalStart < s.IntervalEnd
OPTION (QueryRuleOff BuildSpool);

Performance is terrible for this query because there is no possible index that will make that join condition fast. The query takes 955 seconds of CPU time and 134 seconds of elapsed time at MAXDOP 8.

This blog post is about batch mode, so we need to turn that nested loop join into a hash join. It is important to recall that a hash join requires at least one equality condition in the join clause.

Exploring the Data

It’s important to Know Your Data while query writing, especially if you want to cheat in a performance competition. Taking a quick look at the Auctions table with the 400k row configuration:

The things that stand out to me are that there are an equal number of supply and demand rows, both supply and demand have nearly the same average value, and the maximum quantity for both supply and demand is under 100. We can exploit the relatively low maximum quantity value to improve performance. A supply interval with a end point that is more than 100 units away from the demand end point cannot possibly intersect it. This is one of those things that feels intuitively correct, but I’ll go ahead and prove it anyway by contradiction.

Suppose that 100 is the maximum interval length, [d_start, d_end] and [s_start, s_end] overlap, s_end is more than 100 units away from d_end.

  1. The distance between end points implies that d_end < s_end – 100
  2. If they overlap, then s_start < d_end
  3. This implies that s_start < d_end < s_end – 100
  4. This implies that s_start < s_end – 100
  5. This implies that s_end – s_start > 100

The final statement is impossible because 100 is the maximum interval length. You can do the same proof in the other direction. Therefore, it should be safe to add some filters to our original query:

SELECT d.ID DemandId, s.ID as SupplyID,
CASE
WHEN d.IntervalEnd >= s.IntervalEnd THEN s.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
ELSE d.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
END TradeQuantity
FROM #demand_intervals d WITH (TABLOCK)
CROSS JOIN #supply_intervals s WITH (TABLOCK)
WHERE d.IntervalEnd > s.IntervalStart AND d.IntervalStart < s.IntervalEnd
AND s.IntervalEnd >= d.IntervalEnd - 100
AND s.IntervalEnd <= d.IntervalEnd + 100;

Now we can get a proper index seek on the s.IntervalEnd column. Performance is significantly better with this additional filter clause. The total runtimes for all steps, including the omitted temp table creation, are 1030 ms of CPU time and 345 ms of elapsed time. We are of course not using batch mode though:

Buckets

Time for more math. Starting with the following filter conditions:

AND s.IntervalEnd >= d.IntervalEnd – 100
AND s.IntervalEnd <= d.IntervalEnd + 100;

I can divide both sides by 100 and the condition will still be true:

AND s.IntervalEnd / 100 >= d.IntervalEnd / 100 – 1
AND s.IntervalEnd / 100 <= d.IntervalEnd / 100 + 1

You’ll have to take my word for it that the equation is still true if we truncate everything down to integers:

AND FLOOR(s.IntervalEnd / 100) >= FLOOR(d.IntervalEnd / 100) – 1
AND FLOOR(s.IntervalEnd / 100) <= FLOOR(d.IntervalEnd / 100) + 1

Now we’re dealing with integers, so we can express this as the following:

FLOOR(s.IntervalEnd / 100) IN (FLOOR(d.IntervalEnd / 100) – 1, FLOOR(d.IntervalEnd / 100) + 0, FLOOR(d.IntervalEnd / 100) + 1)

Rewriting once again:

FLOOR(s.IntervalEnd / 100) = FLOOR(d.IntervalEnd / 100) – 1
OR FLOOR(s.IntervalEnd / 100) = FLOOR(d.IntervalEnd / 100) + 0
OR FLOOR(s.IntervalEnd / 100) = FLOOR(d.IntervalEnd / 100) + 1

That’s still not eligible for a hash join, but it is if we change it to a UNION ALL:

... JOIN ON ...
FLOOR(s.IntervalEnd / 100) = FLOOR(d.IntervalEnd / 100) - 1

UNION ALL

... JOIN ON ...
FLOOR(s.IntervalEnd / 100) = FLOOR(d.IntervalEnd / 100) + 0

UNION ALL

... JOIN ON ...
FLOOR(s.IntervalEnd / 100) = FLOOR(d.IntervalEnd / 100) + 1

We have successfully transformed a single BETWEEN join filter into three joins with equality filters. All three of those joins are eligible for batch mode.

A visual representation may be helpful. Suppose we have two demand intervals that end in 150.2 and 298.2. We need to match those intervals with 8 supply intervals with end points that range from 1.1 to 399.1. To do so, we can divide the intervals into buckets of 100 width and join on matching bucket values as well as the immediately adjacent bucket values:

The demand interval that ends in 150.2 has a bucket value of 1, so it is matched with supply buckets 0, 1, and 2. In this way, we can guarantee that any supply interval that’s 100 or fewer units away ends up getting matched with the right demand intervals. The supply buckets with a background of red are only matched to the demand interval of 150.2, blue is only matched to the demand interval of 298.2, and purple is matched to both. The buckets are overly inclusive of course. The actual rows that might match based on distance have their font color changed in the same way, but filtering out extra rows later will be no problem. The important thing is that we have an equality condition to work with. Now that we can finally perform hash joins, it’s time to work out all of the details.

Gotta Go Fast

We need to perform three operations:

  1. Calculate the maximum interval length.
  2. Calculate running totals for supply and demand, their bucket values, and load them into temporary objects.
  3. Query the temp tables to load the final results into another table.

For step 1, we simply need an index on the Quantity column to make the query fast. We also need to get an exclusive lock and hold it until we’re finished with all of the steps. Otherwise the max length could change in the middle of our code’s execution.

For step 2, the best performing options are either a MAXDOP 1 query with an ordered index scan and a batch mode window aggregate or a parallel query with a batch mode sort, parallel batch mode window aggregate, and parallel insert. An NCCI on the Auctions table is helpful in getting batch mode. In order to make the code go as fast as possible, I elected to use the parallel option at MAXDOP 4 even though it uses significantly more CPU time than the MAXDOP 1 option. DOP is at 4 instead of 8 due to contention caused by the NESTING_TRANSACTION_FULL latch. Here’s an estimated plan picture in case you don’t know what I mean by all of that:

For the third query, I’m using a UNION ALL of 3 joins that are eligible for batch mode like I said earlier. I added some query hints to reduce compile time.

As a step 0 that I forgot to mention earlier, create the following indexes on the Auctions table:

CREATE INDEX IX_Auctions__Code_ID__INCLUDE_Quantity ON Auctions (Code, ID) INCLUDE (Quantity);

CREATE INDEX IX_Auctions__Quantity ON Auctions (Quantity);

CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_Auctions ON Auctions (Code) with (MAXDOP = 1);

You can find the full code below for all three steps:

SET NOCOUNT ON;

DROP TABLE IF EXISTS #PairingsCursor2;
CREATE TABLE #PairingsCursor2
(
DemandID INT NOT NULL,
SupplyID INT NOT NULL,
TradeQuantity DECIMAL(19, 6) NOT NULL
);

DROP TABLE IF EXISTS #demand_intervals;
CREATE TABLE #demand_intervals (
ID INT NOT NULL,
IntervalStart DECIMAL(19, 6) NOT NULL,
IntervalEnd DECIMAL(19, 6) NOT NULL,
IntervalBucket BIGINT NOT NULL
);

DROP TABLE IF EXISTS #supply_intervals;
CREATE TABLE #supply_intervals (
ID INT NOT NULL,
IntervalStart DECIMAL(19, 6) NOT NULL,
IntervalEnd DECIMAL(19, 6) NOT NULL,
IntervalBucket BIGINT NOT NULL
);

DECLARE @MaxQuantityRange DECIMAL(19, 6);

BEGIN TRANSACTION;

SELECT @MaxQuantityRange = MAX(Quantity) - MIN(Quantity)
FROM Auctions WITH (TABLOCKX);

INSERT INTO #demand_intervals WITH (TABLOCK)
SELECT ID, rt - Quantity IntervalStart, rt IntervalEnd, CAST(rt / @MaxQuantityRange AS BIGINT) AS IntervalBucket
FROM
(
SELECT a.ID, Quantity, SUM(Quantity) OVER (ORDER BY a.ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) rt
FROM dbo.Auctions a WITH (TABLOCK)
WHERE Code = 'D'
) q
OPTION (USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MAXDOP 4);

INSERT INTO #supply_intervals WITH (TABLOCK)
SELECT ID, rt - Quantity IntervalStart, rt IntervalEnd, CAST(rt / @MaxQuantityRange AS BIGINT) AS IntervalBucket
FROM
(
SELECT a.ID, Quantity, SUM(Quantity) OVER (ORDER BY a.ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) rt
FROM dbo.Auctions a WITH (TABLOCK)
WHERE Code = 'S'
) q
OPTION (USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MAXDOP 4);

/*
-- prevents temp table caching, slight performance overhead in last query, but avoids "expensive" stats gathering for uncached object scenario
CREATE STATISTICS s0 ON #demand_intervals (IntervalBucket) WITH SAMPLE 0 ROWS;
CREATE STATISTICS s1 ON #supply_intervals (IntervalBucket) WITH SAMPLE 0 ROWS;
*/

INSERT INTO #PairingsCursor2 WITH (TABLOCK)
SELECT d.ID DemandId, s.ID as SupplyID,
CASE
WHEN d.IntervalEnd >= s.IntervalEnd THEN s.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
ELSE d.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
END TradeQuantity
FROM #demand_intervals d WITH (TABLOCK)
INNER JOIN #supply_intervals s WITH (TABLOCK) ON d.IntervalBucket = s.IntervalBucket
WHERE d.IntervalEnd > s.IntervalStart AND d.IntervalStart < s.IntervalEnd

UNION ALL

SELECT d.ID DemandId, s.ID as SupplyID, CASE WHEN d.IntervalEnd >= s.IntervalEnd THEN s.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
ELSE d.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
END TradeQuantity
FROM #demand_intervals d WITH (TABLOCK)
INNER JOIN #supply_intervals s WITH (TABLOCK) ON s.IntervalBucket = d.IntervalBucket - 1
WHERE d.IntervalEnd > s.IntervalStart AND d.IntervalStart < s.IntervalEnd

UNION ALL

SELECT d.ID DemandId, s.ID as SupplyID, CASE WHEN d.IntervalEnd >= s.IntervalEnd THEN s.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
ELSE d.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
END TradeQuantity
FROM #demand_intervals d WITH (TABLOCK)
INNER JOIN #supply_intervals s WITH (TABLOCK) ON s.IntervalBucket = d.IntervalBucket + 1
WHERE d.IntervalEnd > s.IntervalStart AND d.IntervalStart < s.IntervalEnd
OPTION (USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MAXDOP 8, HASH JOIN, CONCAT UNION, FORCE ORDER, NO_PERFORMANCE_SPOOL); -- reduce compile time

COMMIT TRANSACTION;

For the first execution, CPU time is generally around 1245 ms and elapsed time is around 500 ms. For subsequent executions, CPU time is around 634 ms and elapsed time is around 168 ms. Here is an actual execution plan for step 3:

Not surprisingly, the join for matching buckets returns significantly more rows than the joins for adjacent buckets.

REEEEEEEEEEEEECOMPILES

You may be wondering why the first execution takes so much longer than subsequent executions. The problem is the creation of temp table statistics. For some reason, SQL Server issues four StatMan queries per statistic that it creates:

Some of them run at MAXDOP 1 and some of them run at MAXDOP 8 and all of that time adds up. After the first execution the creation of statistics no longer happens, I assume due to statistics caching for temporary tables. Even adding an explicit FULLSCAN create statistics statement doesn’t avoid this problem for some reason. You can create 0 row statistics if you like (see the commented out code), but I’m going to declare a moral victory here instead of digging into it further. If the biggest performance problem for your query is statistics creation on a pair of 200k row temp tables then you probably have pretty efficient code.

With that said, table variables are an interesting alternative here. I get consistent runtimes of 550 CPU ms and 230 ms of elapsed time using table variables. The deferred compilation feature introduced in SQL Server 2019 is important to get the right query plan. We lose a bit of runtime for the MAXDOP 1 table variable inserts and the final insert is slightly less efficient as well due to the missing statistics. Still, the runtimes are consistent between the first and second executions and overall CPU usage is down. Also you have to admit that seeing a batch mode table variable scan with a correct cardinality estimate (without RECOMPILE) is pretty cool:

Query Scaling

The performance of the batch mode query greatly depends on the maximum interval length. Query runtime increases in a roughly linear fashion as the max length increases:

The linear pattern is broken for the final test case because all of the rows are in a single bucket at that point, so two out of three of the joins don’t do much of anything. Is this type of query safe to use in production? Sure, as long as you’re okay with significantly worse performance if a single outlier row happens to appear. Possible defenses against that include additional constraints on the data, knowing your data well, or writing a fallback algorithm if the maximum interval length is too long for the batch mode bucketizing approach to perform well.

Final Thoughts

This blog post shows how batch mode hash joins can be used as an efficient solution to find intersecting intervals, provided that the maximum interval length is sufficiently small. The important thing is everyone involved had fun. Thanks for reading!

Blocking on Columnstore Indexes that RCSI and NOLOCK Don’t Resolve

I recently ran into a production issue where a SELECT query that referenced a NOLOCK-hinted table was hitting a 30 second query timeout. Query store wait stats suggested that the issue was blocking on a table with a nonclustered columnstore index (NCCI). This was quite unexpected to me and I was eventually able to produce a reproduction of the issue. I believe this to be a bug in SQL Server that’s present in both RTM and the current CU as of this blog post (CU14). The issue also impacts CCIs as well but I did significantly less testing with that index type.

The Setup

First I’ll create a heap with 500k rows. All of the rows have the same value. I’ll also create an NCCI on the single column of the table.

DROP TABLE IF EXISTS dbo.TEST_NCCI_1;
CREATE TABLE dbo.TEST_NCCI_1 (
ID VARCHAR(10) NOT NULL
);

INSERT INTO dbo.TEST_NCCI_1
SELECT TOP (500000) '1'
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;

CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI ON dbo.TEST_NCCI_1 (ID) WITH (MAXDOP = 1);

The NCCI has a single compressed rowgroup with no rows in the delta store or delete buffer. It’s an honest table and I’m not trying to trick you:

We also need a table to join to. A heap with a single row will serve that need:

DROP TABLE IF EXISTS dbo.JOIN_TO_ME;
CREATE TABLE dbo.JOIN_TO_ME (
ID VARCHAR(10) NOT NULL
);

INSERT INTO dbo.JOIN_TO_ME VALUES ('1');

Suppose a different session has a long held exclusive table lock on TEST_NCCI_1. For example, the code below could be running in a different session with a yet to be committed transaction:

BEGIN TRANSACTION;

SELECT TOP (1) *
FROM dbo.TEST_NCCI_1 WITH (TABLOCKX);

Would you expect the following query to be blocked by the open transaction?

SELECT COUNT_BIG(*)
FROM dbo.TEST_NCCI_1 a WITH (NOLOCK)
Left Outer Join dbo.JOIN_TO_ME b WITH (NOLOCK) ON a.ID = b.ID;

Test Results

I ran 45 tests in total by varying the isolation level (SET TRANSACTION ISOLATION LEVEL) and the locking hint at the table level. You may be wondering why I bothered to do so. According to the documentation, a table level locking hint overrides the isolation level for read operations:

Only one of the isolation level options can be set at a time, and it remains set for that connection until it is explicitly changed. All read operations performed within the transaction operate under the rules for the specified isolation level unless a table hint in the FROM clause of a statement specifies different locking or versioning behavior for a table.

This is not what I observed which is part of why I believe that the current behavior in SQL Server should be considered a bug. In the table below, “YES” means that the query was blocked and “NO” means the query was able to execute. I used red font for the behavior which I believe to be in error based on my understanding of expected locking behavior in SQL Server.

To summarize the results:

  • The query is always blocked under the read committed, repeatable read, and serializable isolation levels. The table locking hint does not matter.
  • Blocking behavior works as expected under the read uncommitted and snapshot isolation levels.
  • RCSI does not help here. With RCSI, to avoid getting blocked you need to set the transaction isolation level to READ UNCOMMITTED.

I was able to use the lock_acquired extended event to see the problematic requested lock. In the screenshot below, the results are filtered to the object level. The first set of locks that occurred around the 11 second mark were under the read uncommitted isolation level. The second set of locks at around the 21 second mark where under the read committed isolation level. A table level NOLOCK hint was included for both queries.

To be clear, it is unexpected to see an IS object lock for a SELECT query with a table level NOLOCK hint. As far as I can tell, there are no interesting locks taken after the IS lock is acquired:

I attempted to investigate further by getting callstacks before the IS lock is acquired:

sqlmin.dll!XeSqlPkg::lock_acquired::Publish+0x228
sqlmin.dll!lck_lockInternal+0x1139
sqlmin.dll!LockAndCheckState+0x2c5
sqlmin.dll!GetHoBtLockInternal+0x445
sqlmin.dll!ColumnDataSetSession::WakeUp+0xa86
sqlmin.dll!NormalColumnDataSet::WakeUp+0xe
sqlmin.dll!ColumnDataSetSession::Create+0x183
sqlmin.dll!ColumnsetSS::WakeUpInternal+0x1ec
sqlmin.dll!ColumnsetSS::WakeUp+0x15e
sqlmin.dll!CreateDictionaryRowsetHelper+0x250
sqlmin.dll!CBpLocalDictionaryManager::GetSEStringDictionaryRowset+0x416
sqlmin.dll!CBpLocalDictionaryManager::FGetData+0xd1
sqlmin.dll!CBpDeepDataContextForBatch::GetStringValue+0x1d
sqlTsEs.dll!CTEsHashMultiData<167,1>::MdEsIntrinFn<CMDPureInput,CMDIteratorAllPureInputs>+0x13d
sqlTsEs.dll!CEsMDIntrinsicWrapper::UnaryImpl<&CTEsHashMultiData<167,1>::MdEsIntrinFn<CMDPureInput,CMDIteratorAllPureInputs>,&CTEsHashMultiData<167,1>::MdEsIntrinFn<CMDImpureInput,CMDIterator> >+0xd1
sqlTsEs.dll!CEsExecMultiData::GeneralEval+0x188
sqlTsEs.dll!CMultiDataEsRuntime::Eval+0x247
sqlmin.dll!CBpComputeMulticolumnHashRunTime::Eval+0x5b7
sqlmin.dll!CBpPartialJoin::ProcessProbeSide+0x445
sqlmin.dll!CBpQScanHashJoin::Main+0x148
sqlmin.dll!CBpQScanHashJoin::BpGetNextBatch+0x28
sqlmin.dll!CBpQScan::GetNextBatch+0x6f
sqlmin.dll!CBpChildIteratorScanner::BpGetNextBatch+0x12
sqlmin.dll!CBpQScanHashAggNew::ProcessInput+0x8b

I’m not terribly surprised to see a reference to a dictionary in the callstacks because I’m not able to reproduce the issue with an INT or BIGINT column. However, I have no idea where to go from here.

Workarounds

Adding an IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX query hint was an effective workaround for all of my test cases. I experienced the expected locking behavior for all test cases with that hint, so it is apparent to me that the issue is caused by the query plan reading from the NCCI.

Disabling batch mode using the DISALLOW_BATCH_MODE use hint seems to be an effective workaround as well, but I did not test this as thoroughly.

Changing the isolation level may also be an acceptable way to avoid the unexpected blocking for some scenarios. For example, if you already have a NOLOCK table hint in place then changing the transaction isolation level to read uncommitted may not introduce any new data correctness issues.

Final Thoughts

For some queries, a simple join on a VARCHAR column between a table with a columnstore index and another table can lead to unexpected blocking, even with a table level NOLOCK hint or with read committed snapshot isolation enabled for the database. This is frustrating and disappointing behavior from a concurrency perspective. It is an odd scenario where enabling snapshot isolation offers a significant benefit over enabling read committed snapshot isolation. I hope that this locking behavior changes in a future version of SQL Server. Thanks for reading!

UDF Inlining And String Aggregations In SQL Server 2019

Quatro Enemigos


This post is really four different posts. Maybe five. The main points are:

Let’s start at the top, because tops are top.

Strung Out


If you’re like me and you got excited by the induction of STRING_AGG into the T-SQL Lexicon because of all the code odd-balling it would replace, you were likely also promptly disappointed for a few reasons.

  • Needing to convert the element to be AGGed to a MAX to avoid errors
    • STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation
  • The WITHIN GROUP ordering is clunky compared to an outer ORDER BY (but hey, logical query processing…)
  • No support for DISTINCT in the function, and an outer DISTINCT tries to make the wrong thing DISTINCT (see above)

And of course, it’s a breaking limitation for UDF inlining.

The UDF does not reference the STRING_AGG function

Let’s look at all of that stuff at once, in one function.

CREATE OR ALTER FUNCTION 
    dbo.IsStringAgg (@UserId int)
RETURNS 
    nvarchar(max)
WITH 
    RETURNS NULL ON NULL INPUT, 
    SCHEMABINDING
AS
BEGIN
RETURN
(
    SELECT
        STRING_AGG
        (
            CONVERT
            (
                nvarchar(MAX),
                b2.Name
            ), 
            N', '
        )
        WITHIN GROUP 
        (
            ORDER BY 
                b2.Name
        )
    FROM
    (
        SELECT DISTINCT 
            b.Name
        FROM dbo.Badges AS b
        WHERE b.UserId = @UserId
    ) AS b2
);
END;
GO

Not exactly  a thing of beauty, is it? Let’s hold onto that for one second, though.

XML > JSON


Okay, so let’s get back to that UDF documentation.

The UDF does not reference XML methods

That sort of reads like NOXML4U, which is my friend Jeremiah’s license plate. In reality though, it means all the .method thingies, like value, node, query, etc.

So if you were to try to follow my prior advice on string concatenation, the function couldn’t be inlined.

SELECT
   x = 
   (
       SELECT 
           [text()] = 
               b.Name
       FROM dbo.Badges AS b
       WHERE b.Id = 100564
       FOR XML 
           PATH(''),
           TYPE
   ).value
     (
         './text()[1]', 
         'nvarchar(max)'
     );

Having the .value breaks inlining. So there’s that. I believe this is where Reddit users post something like le sigh with some poorly drawn stick figure.

Del The Funktion


We can write the function semi-correctly like so:

CREATE OR ALTER FUNCTION 
    dbo.NotStringAgg (@UserId int)
RETURNS 
    nvarchar(max)
WITH 
    RETURNS NULL ON NULL INPUT, 
    SCHEMABINDING
AS
BEGIN
RETURN
    STUFF
    (
        (
            SELECT
                N', ' + 
                b.Name
            FROM dbo.Badges AS b
            WHERE b.UserId = @UserId
            GROUP BY b.Name
            ORDER BY b.Name
            FOR XML PATH (N'')
        ), 
        1,
        2,
        N''
    );
END;
GO

Alright, now let’s BRING THAT CODE BACK (air horn)

Sea Vessel Soliloquy


If we compare the execution plans for these two functions, the XML one gets inlined, and the STRING_AGG one does not.

SELECT TOP (5000)
    u.DisplayName,
    ThanksForAllYourHardWork = 
        dbo.NotStringAgg(u.Id)
FROM dbo.Users AS u
ORDER BY u.Reputation DESC;
GO 

SELECT TOP (5000)
    u.DisplayName,
    ThanksForAllYourHardWork = 
        dbo.IsStringAgg(u.Id)
FROM dbo.Users AS u
ORDER BY u.Reputation DESC;
GO

Here’s the plan for the inlined function:

SQL Server Query Plan
blaze it

Here’s the plan for the non-inlined function:

SQL Server Query Plan
thin ice

Stay Thirsty


The inlined function finishes about twice as fast, though one may pause for a moment to consider whether the 400ms difference is an earth shattering kaboom in this case.

Of course, the real kicker is when scalar UDFs are invoked as part of larger queries where parallelism, etc. is important.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Rewriting Scalar UDFs For SQL Server 2019 Inlining

You May Find Yourself


SQL Server 2019 has been a bit of a roller coaster ride. In particular, UDF inlining started as I think the most interesting addition to the product. Big brain stuff, for sure.

It has been nerfed quite a bit, with seemingly more and more restrictions added to every cumulative update. Hopefully some of these can be lifted at the feature matures, but I understand how difficult all this is.

People program absolute bloodbaths into functions.

Today, I want to look at one restriction that has a fairly simple workaround: Calling GETDATE().

Emptied


Let’s use a simple function that figures out if fewer than 90 days exist between two dates.

CREATE OR ALTER FUNCTION dbo.sneaky
(
    @StartDate datetime,
    @EndDate datetime
)
RETURNS bit
WITH
    SCHEMABINDING,
    RETURNS NULL ON NULL INPUT
AS
BEGIN
RETURN 
    CASE 
        WHEN DATEDIFF
             (
                 DAY,
                 @StartDate,
                 ISNULL(@EndDate, GETDATE())
             ) < 90
       THEN 1
       ELSE 0 
    END;
END;
GO

This function can’t be inlined, because we call GETDATE() inside the function body. We can witness all the baddities that scalar UDFs cause as usual.

SELECT
    u.DisplayName,
    sneaky = 
        dbo.sneaky(u.CreationDate, u.LastAccessDate)
FROM dbo.Users AS u
WHERE EXISTS
      (
          SELECT
              1/0
          FROM dbo.Posts AS p
          JOIN dbo.Votes AS v
              ON v.PostId = p.Id
          WHERE p.OwnerUserId = u.Id
      );

The query won’t be allowed to use parallelism, the function will execute once per row it needs to process, etc. etc. etc.

SQL Server Query Plan
quarters

Cool Street


The better option is to use a third function argument that you can pass GETDATE() to.

CREATE OR ALTER FUNCTION dbo.sneakier
(
    @StartDate datetime,
    @EndDate datetime,
    @FallBack datetime
)
RETURNS bit
WITH
    SCHEMABINDING,
    RETURNS NULL ON NULL INPUT
AS
BEGIN
RETURN 
    CASE 
        WHEN DATEDIFF
             (
                 DAY,
                  @StartDate,
                  ISNULL(@EndDate, @FallBack)
             ) < 90
       THEN 1
       ELSE 0 
    END;
END;
GO

You could also change the calling query to protect from NULLs and remove the check from the function, like so:

dbo.sneaky(u.CreationDate, ISNULL(u.LastAccessDate, GETDATE()))

Calling the new function like so results in a much faster query execution time:

SELECT
    u.DisplayName,
    sneaky = 
        dbo.sneakier(u.CreationDate, u.LastAccessDate, GETDATE())
FROM dbo.Users AS u
WHERE EXISTS
      (
          SELECT
              1/0
          FROM dbo.Posts AS p
          JOIN dbo.Votes AS v
              ON v.PostId = p.Id
          WHERE p.OwnerUserId = u.Id
      );
GO
SQL Server Query Plan
on my head?

The plan is allowed to go parallel, and rather than the function being hidden in a Compute Scalar operator, it’s represented in the query plan by Constant Scan operators that produce the necessary rows.

SQL Server Query Plan
too many horses

Pokemon Drift


These types of rewrites will probably become more common as people move to newer versions of SQL Server, and embrace higher compatibility levels where these features are allowed to maneuver.

Of course, at the rate things change, that may be a long ways off.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

A Silent Bug With Clustered Column Store Indexes And Computed Columns In SQL Server 2019

Oh Boy


This bug has been reported to the proper authorities, and this blog post is not to dunk on the Summer Intern responsible for column store indexes, but to warn you to not do this, because I don’t know what it will break under the covers.

If you read the documentation for column store indexes, it says that column store indexes can’t be created on persisted computed columns.

And that’s true. If we step through this script, creating the column store index will fail.

/*Bye Felicia*/
DROP TABLE IF EXISTS dbo.cci;
GO 

/*Hi Felicia*/
CREATE TABLE dbo.cci(id int, fk int, sd date, ed datetime);
GO 

/*COMPUTER!*/
ALTER TABLE dbo.cci ADD cc AS id * 2;
ALTER TABLE dbo.cci ALTER COLUMN cc ADD PERSISTED;
GO 

/*Rats*/
CREATE CLUSTERED COLUMNSTORE INDEX c ON dbo.cci;
GO

The error message we get is pretty clear:

Msg 35307, Level 16, State 1, Line 76
The statement failed because column 'cc' on table 'cci' is a computed column. 
Columnstore index cannot include a computed column implicitly or explicitly.

80 Blocks from Tiffany’s


However, if we change the order of things a little bit, we’ll find that we can add a persisted computed column to a table with a clustered column store index on it.

/*Bye Felicia*/
DROP TABLE IF EXISTS dbo.cci;
GO 

/*Hi Felicia*/
CREATE TABLE dbo.cci(id int, fk int, sd date, ed datetime, INDEX c CLUSTERED COLUMNSTORE);
GO 

/*???*/
ALTER TABLE dbo.cci ADD cc AS id * 2;
ALTER TABLE dbo.cci ALTER COLUMN cc ADD PERSISTED;
GO 

/*With data?*/
INSERT dbo.cci
(
    id, fk, sd, ed
)
VALUES
(
    10, 10, GETDATE(), GETDATE()  
);

/*yepyepyepyepyepyep*/
SELECT
    c.*
FROM dbo.cci AS c

/*Eh?*/
ALTER INDEX c ON dbo.cci REBUILD;
ALTER INDEX c ON dbo.cci REORGANIZE;
DBCC CHECKTABLE('dbo.cci');

And everything seems to work. However, if we drop the column store index, it can’t be re-created.

The key here is that the computed column is added, and then the persisted attribute is added second. That is where an error should be thrown.

But what do I know? I’m just a bouncer.

Bomber


Again, I’m telling you not to do this. I’m telling you that it might break things in gruesome and horrible ways.

I don’t think that this is a preview of upcoming support for persisted computed columns in clustered column store indexes.

Thanks for reading!

p.s. Really, don’t do this

p.p.s. If you do this you’re asking for trouble

p.p.p.s. This was fixed in CU12

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.