Should An Optimistic Isolation Level Be The New Default For SQL Server Databases?

Two Words


Alright, I’ve lied to you already. There are way more than two words involved, here.

I recently had breakfast (JUST BREAKFAST, GOD) with Microsoft’s Most Handsome Man™, and the topic came up.

Partially because there are such fundamental changes coming to SQL Server, and also because it’s already the default in Azure SQL DB.

If I had to name a redeeming quality of Azure SQL DB, that would be it.

Unboxing


I’ve seen pessimistic locking causing problems and heartache on many, many servers.

While it wasn’t totally the fault of the isolation level (query and index tuning was needed, and there was some other silliness), it shouldn’t take that kind of dedication to make It Just Run Faster©.

Possibly the worst side effect is people leaning heavily on reading dirty data (via read uncommitted/nolock) to avoid issues.

You can’t preach about the evils of dirty reads without offering ways to avoid blocking.

Yes, I’m looking at you.

You all cackle at seeing NOLOCK everywhere, but I don’t hear much about solving blocking problems without it.

Standards and Futures


Right now “other major database platforms” offer optimistic locking as the default.There are implementation differences, but the net result is the same.

Readers and writers don’t suffer locking contention, and only fully committed data is read.

SQL Server currently offers optimistic locking via Read Committed Snapshot Isolation (RCSI from here), and Snapshot Isolation (SI from here).

Right now, they both send row versions to tempdb. But in the works for SQL Server 2019 is a feature called Accelerated Database Recovery (ADR from here). The mechanism that makes this possible is a Persistent Version Store (PVS from here) that’s local to the database, not tempdb.

While RCSI and SI could end up sending quite a bit of data to tempdb, which could cause contention there without a lot of precautions and setup work, any potential contention from ADR would be localized.

I know, Microsoft has been addressing tempdb, too. Setup now configures tempdb in a mostly sane way, and trace flags 1117 and 1118 are the default behavior in 2016+. That should make optimistic locking a more comfortable setting for people, but no. No. I hardly ever see it in use.

Dependencies


Breaking changes are hard. I know, there are people out there who depend on pessimistic locking for certain patterns to maintain correctness.

The thing is, it’s a lot easier to have just these processes use pessimistic locking while allowing the rest of the us to not have to sprinkle NOLOCK hints every which where to get a report to run.

An example of a process that requires some locking hints is Paul White’s post about Sequence Tables.

At this point in time, if you’re implementing a queue in SQL Server, you should be the one learning about how to get the kind of locking you need to get it working. Normal people who just want their app to work shouldn’t be the ones staring at articles about isolation levels, locking hints, lock escalation, etc.

Get It, Get It


This is a good problem to get in front of. I’d be quite happy to not have to talk about the reader/writer aspects of locking anymore.

In the same way, it would be nice to not have to talk to users who are endlessly frustrated by locking problems, explain dirty reads, explain optimistic isolation levels, caution them against certain issues, and then tell them to have fun removing all those old lock hints.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.



17 thoughts on “Should An Optimistic Isolation Level Be The New Default For SQL Server Databases?

  1. You speak of people cackling about not using WITH (NOLOCK) and not providing solutions otherwise and then talk about how RCSI and SI are the solution. Fortunately, you already know the reason why a lot of people haven’t implemented that as a solution because you wrote…

    “While RCSI and SI could end up sending quite a bit of data to tempdb, which could cause contention there without a lot of precautions and setup work.”

    The other and best fix has been to fix utterly crap code. 😀 😀 😀

    I’m looking forward to 2019 to see how well PVS actually works but my warning would be that is shouldn’t be used as a veil to cover up the rotting meat known as crap code. 😉

    As for MS making TF 1117 the unchangeable default for TempDB, that screwed me for some of the stuff I do. Even though there’s a “connect” item (no longer called that but the replacement is hard to remember because I normally don’t use that) on it, a lot of people don’t realize that IDENTITY INSERT causes all the data being copied to materialize in TEMPDB and if all your files are set to grow evenly, you end up with 1.6 worth of required space when trying to copy one 200GB table and you have 8 TempDB files because you end up with a 1.6TB TempDB, which is just stupid. The enforcement of the effects of TF1117 and not being able to temporarily disable it have screwed me.

    Let’s hope that the “improvements” they’re making in 2017 thru 2019 (I’m currently using 2016 EE) don’t put the screws to me and others that have no so unusual big stuff to do.

  2. p.s. And apologies for the structure and the occasional misspellings in my response… I’ve not yet had sufficient quantities of caffeine this morning and haven’t had a cigarette since Friday night and the crowbar that I’m chewing on keeps getting in the way.

    1. Good luck quitting. I stopped 10 years ago (pack a day for 15 years), and was basically delirious for the first week. I think I spent more money on chewing gum the next three months than I would have on cigarettes.

      Friends of mine who have quit more recently have switched over to vaping and tapering down the strength of whatever the hell is in those cartridges.

      1. Update. Still haven’t had even a puff because I know that “I’m just a puff away from a pack a day”. Been there, done that. I haven’t been able to quit nicotine yer. Still using mint flavored nicotine lozenges. 😀

          1. Same here on the 2MG thing. And the mint part of it is another addiction that I have to fight. It turns out that regular peppermint lozenges were a part of the reason for me going into AFIB the time I tried to quit a few years before that.

            I also miss going outside for a smoke. I’m trying to do that between lozenges but it’s not working. 🙁

  3. I disagree RCSI should be the default. I had the unfortunate experience of learning this lesson the hard way. A few years ago in an effort to improve performance I enabled RCSI on database which sees 35K transaction a second. Of course we did this in dev and qa first, but testing didn’t have the same variety and volume of transactions. After turning on in production we encountered performance problems. A Sev A case with Microsoft lasted one week until we turned off RSCI–system back to “normal”. Is this bad code, perhaps? Or it could be RCSI performs terribly on very high-volume systems and you should very cautious about enabling. The other scary thing, RCSI is required for AlwaysOn–won’t using that for this system.

  4. Frankly, I can’t really expect the devs to know how different apps behave differently in the DB after switching some parts to RCSI. Working with optimistic concurrency is very different than the default and well-known pessimistic way, and I’m 100% sure it will cause many cases of “wait, why are we processing the same rows again?!” / “why is the balance now negative?!” / etc.

    I’d much rather handle occasional outages due to blocking and improve the queries and indexing (while teaching the devs to do it too) than risk the business processes doing unexpected stuff.

    For a new company developing a new product and infrastructure, sure. But changing parts of existing apps is very risky, before even considering the performance implications.

    1. “Processing the same rows again” sounds exactly like the kind of queueing that should be handled via pessimistic locking. My point isn’t that it’s a perfect fit for every situation, only that it’s a far less painful fit in most situations. Everyone who does-this-specific-thing will disagree with me, but it’d be far easier for people doing-specific-things to say “set transaction isolation level read committed” instead of making the rest of the world wrestle with awful blocking issues.

  5. Well, knowing that the “Serializable” is also the default isolation level specified by the SQL standard, having Read Committed as default seems a little bit unfair. Now, many things are improved – from software to hardware and may be it is a time for a change (at least for the newly created databases).

    The RCSI is the default for SQL database, so I believe soon we will see it as default for SQL Server database, too.

  6. Paul White’s post about sequence tables isn’t really a thing anymore, is it? I mean, that article was from 2010 and MSSQL now has these sequence objects which kind of eliminates the need for manually handling all that, no?

    1. Yes and no. I still see people use sequence tables fairly often, because they were implemented pre-2012.

      Sometimes, they can’t fully make the switch over because they still have to support older versions (or the code isn’t causing a problem, so they leave it alone).

      Mostly though, it’s a good example of how to use locking hints correctly for a specific situation, which I’d rather see be the norm instead of people needing to use NOLOCK.

      Thanks!

    2. Also, a serialized sequence table has an advantage over sequence / identity / etc.: you can much better control gaps, if that is important. It shouldn’t be important – in most cases, it’s only important because someone wants it to be important, but there are some instances where a gap is a legal nightmare.

      1. Just have a process that runs in a loop checking the max used id in each table against the max used id in the sequence table and reset it if there’s a gap right? 😭

        1. Perhaps this is a bad solution, but I created a query to return the next unused ID in a table which could have gaps. We wanted 4 digit numbers, but someone blew that by manually picking 9999, so I couldn’t just use MAX(myKey) and overlook the gaps…
          The SQL Statement uses a self-join and looks like this:
          SELECT MIN(t1.myKey+1) FROM myTable t1 WHERE NOT EXISTS(SELECT NULL FROM myTable t2 WHERE t2.myKey = t1.myKey + 1)
          Cheers!

Comments are closed.