Normalizing SQL Server Tables To Reduce Query Blocking

Chalky


I see a lot of tables that look something like this:

CREATE TABLE dbo.orders
(
    order_id int NOT NULL PRIMARY KEY
         DEFAULT (NEXT VALUE FOR dbo.order_id),
    order_date datetime NOT NULL,
    order_ship_date datetime NOT NULL,
    order_total money NOT NULL,
    order_tax money NOT NULL,
    customer_id int NOT NULL
        DEFAULT (NEXT VALUE FOR dbo.customer_id),
    customer_fullname nvarchar(250),
    customer_street nvarchar(250),
    customer_street_2 nvarchar(250),
    customer_city nvarchar(250),
    customer_state nvarchar(250),
    customer_zip nvarchar(250),
    customer_country nvarchar(250)
);

Looking at the design, there are two big problems:

  1. There are “order” columns that are going to get a lot of inserts and updates
  2. You’re going to be storing the same customer information over and over again

The more related, but not independent, data you store in the same table, the harder it becomes to effectively index that table.

A while back, I blogged about Tables Within Tables, but uh… surprisingly, the problem still exists! Usually when I blog about something, the problem disappears. Hm.

Better But Not Best


If we follow some practical guidance and put named columns into their own table, we end up with this:

CREATE TABLE dbo.orders
(
    order_id int NOT NULL PRIMARY KEY
         DEFAULT (NEXT VALUE FOR dbo.order_id),
    order_date datetime NOT NULL,
    order_ship_date datetime NOT NULL,
    order_total money NOT NULL,
    order_tax money NOT NULL,
    customer_id int NOT NULL
);

CREATE TABLE dbo.customers
(
    customer_id int NOT NULL PRIMARY KEY
        DEFAULT (NEXT VALUE FOR dbo.customer_id),
    customer_fullname nvarchar(250),
    customer_street nvarchar(250),
    customer_street_2 nvarchar(250),
    customer_city nvarchar(250),
    customer_state nvarchar(250),
    customer_zip nvarchar(250),
    customer_country nvarchar(250)
);

This is a better scenario, because we only store customer information once, and inserts/updates to order information don’t impact people working with customer data.

But this still isn’t great — what if a customer wants to send an order to a different address?

If we wanted to store everything in this table, we’d be breaking other practical rules: we’d have to have multiple rows for users, or we’d have to add columns columns to the table to deal with multiple addresses. That’s a mess both for people who don’t use all those extra columns, and for people who might have half a dozen addresses they send to.

Getting There


A better way to phrase the customer table might be like this:

CREATE TABLE dbo.customers
(
    customer_id int NOT NULL PRIMARY KEY
        DEFAULT (NEXT VALUE FOR dbo.customer_id),
    default_fullname nvarchar(250),
    default_street nvarchar(250),
    default_street_2 nvarchar(250),
    default_city nvarchar(250),
    default_state nvarchar(250),
    default_zip nvarchar(250),
    default_country nvarchar(250)
);

Most of the time, people are going to send stuff to one address — call it home if you want. It’s probably also their billing address, so it makes sense for it to be the default, and to have it be the first choice.

Then we’ll have a table of EAV data that looks like this:

CREATE TABLE dbo.customers_address_book
(
    address_id int NOT NULL
        DEFAULT (NEXT VALUE FOR dbo.address_id),
    customer_id int NOT NULL,
    address_type tinyint,
    customer_fullname nvarchar(250),
    customer_street nvarchar(250),
    customer_street_2 nvarchar(250),
    customer_city nvarchar(250),
    customer_state nvarchar(250),
    customer_zip nvarchar(250),
    customer_country nvarchar(250),
    CONSTRAINT pk_cab_id PRIMARY KEY (customer_id, address_id)
);

In a table like this, whenever a customer ships to a non-default address it gets stored off here. Now customers can have as many addresses as they want to choose from without us having to have an extra bloated table of default information plus non-default information.

Because of the way this data is modeled, we don’t need to keep adding columns to accommodate multiple addresses. We just tack rows on, and since this data isn’t likely to get updated the insert/select pattern should end up with minimal blocking.

Tomato Sauce


I know, horrifying. You might have to write a join. You poor, downtrodden developer.

Of course, this makes the most sense when you’re dealing with OLTP workloads. And sure, a lot of these columns probably don’t need to be as long as they are, but that’s a totally different post.

When you’re dealing with reporting data, de-normalizing is generally preferred. Though if you’re doing serious reporting and using column store indexes, I’d probably wanna keep the strings out as much as possible, and just key back to them in other tables. Yuck.

Have I ever mentioned that strings in databases were a mistake?

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Locking Hints Make Troubleshooting Blocking And Deadlocks Confusing In SQL Server

King Of The DMV


Many people will go their entire lives without using or seeing a lock hint other than NOLOCK.

Thankfully, NOLOCK only ever leads to weird errors and incorrect results. You’ll probably never have to deal with the stuff I’m about to talk about here.

But that’s okay, you’re probably busy with the weird errors and incorrect results.

Fill The Void


It doesn’t matter who you are, or which Who you use, they all look at the same stuff.

If I run a query with a locking hint to use the serializable isolation level, it won’t be reflected anywhere.

SELECT 
    u.*
FROM dbo.Users AS u WITH(HOLDLOCK)
WHERE u.Reputation = 2;
GO 100

Both WhoIsActive and BlitzWho will show the query as using Read Commited.

EXEC sp_WhoIsActive 
    @get_task_info = 2,
    @get_additional_info = 1;

EXEC sp_BlitzWho 
    @ExpertMode = 1;

This isn’t to say that either of the tools is broken, or wrong necessarily. They just use the information available to them.

sp_WhoIsActive Locks
ah well

Higher Ground


If you set the isolation level at a higher level, they both pick things up correctly.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SELECT 
    u.*
FROM dbo.Users AS u WITH(HOLDLOCK)
WHERE u.Reputation = 2;
GO 100
sp_WhoIsActive Locks
gratz

Deadlocks, Too


If we set up a deadlock situation — and look, I know, these would deadlock anyway, that’s not the point — we’ll see the same isolation level incorrectness in the deadlock XML.

BEGIN TRAN

UPDATE u
    SET u.Age = 1
FROM dbo.Users AS u WITH(HOLDLOCK)
WHERE u.Reputation = 2;

UPDATE b
    SET b.Name = N'Totally Tot'
FROM dbo.Badges AS b WITH(HOLDLOCK)
WHERE b.Date >= '20140101'

ROLLBACK

Running sp_BlitzLock:

EXEC sp_BlitzLock;
sp_BlitzLock
grousin’

 

Again, it’s not like the tool is wrong. It’s just parsing out information from the deadlock XML. The deadlock XML isn’t technically wrong either. The isolation level for the transaction is read committed, but the query is asking for more.

The problem is obvious when the query hints are right in front of you, but sometimes people will bury hints down in things like views or functions, and it makes life a little bit more interesting.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: Compensating For Lock Waits In SQL Server

No, Lock


Locks are necessary, I’m told. There’s this whole “ACID” thing that makes databases work. It’s generally a good idea for people who into correct answers.

But blocking is one of the most common performance problems I see. Why? Because a long time ago Microsoft made the silly choice to use a pessimistic isolation level by default.

That changes in Azure SQL DB, but what it means for everyone else is that when modification queries come trotting along with their incessant need to lock things, other queries that might want to work with that data have to wait.

For other write queries, that’s a totally acceptable scenario. For queries that need to read data, it’s not cool at all.

In Your Area


Other database platforms use an optimistic isolation level, often called multi-version concurrency control (MVCC).

Under those implementations, queries that need to read data currently being modified will read the previously committed version. Think of it as reading the last known good version of the data.

You can do that in SQL Server using Read Committed Snapshot Isolation (RCSI), or Snapshot Isolation (SI). I don’t want you to think it doesn’t exist — it totally does — it’s just not the default. You have to turn it on.

There are interesting differences between the two, which you can read about here. The biggest general difference is that when you turn on RCSI, all read queries will start using it unless your request asks for a different one. You can do that in a connection string, by setting the transaction isolation level, or using table hints in the query (like NOLOCK).

Overall, this is my favorite way to compensate for locking waits. Sure, you’ll still have blocking for write queries, but your read queries will have a much easier time of things.

Writes vs Writes


When you need to get write queries to cooperate, there are some general steps you can take.

  • Get rid of under-used indexes
  • Batch modifications into smaller chunks
  • Tune the modification queries to be as fast as possible
  • Make sure your modification queries have the right indexes
  • Don’t do goofy things with triggers and foreign keys (especially cascading ones)
  • Only use Change Tracking or Change Data Capture if you really need to

Is this stuff covered in glory? No, not usually. But it’s taking care of the little stuff along the way that makes life easier as your database and userbase grows.

  • Cleaning up poorly utilized indexes will give you fewer objects to lock when queries need to modify data.
  • Batching modification queries gives you a more reliable number of locks, and is kinder on your transaction log
  • Tuning modification queries to get rid of inefficiencies will give you shorter locking durations
  • Giving update and delete queries good indexes to find rows of interest will do the same
  • Not making SQL Server do extra work as data changes gives you fewer side effects to worry about

We’ve covered a lot of ground this month, and I hope you’ve been enjoying it. The last thing we have to talk about is query plans, which is going to help us start tying a lot of stuff together.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

LOB Data And Weird Locks In SQL Server

Working On It


I’ve been re-working a lot of the demos in a presentation that I’m working on called Index Internals That Matter For Performance, because after a first walk through them, I had a lot of realizations about not only what I was presenting, but the way it was presented, and the order I was presenting it in.

This isn’t abnormal, and it’s hardly my finest moment in video history. But I wanted to stream it because a lot of people are out there who might be thinking about getting into blogging or presenting, and I wanted to show that it’s a process, not something you just walk out and nail like a half-court-no-look-over-the-shoulder-nothing-but-net-shot.

Anyway, I came across a weird thing, and had to make some changes to my helper script WhatsUpLocks to get some more information.

And not get blocked.

Read Committed is a trash isolation level. Don’t @ me.

The Not Weirdness


This is the simplest I could work things out to. I don’t have anything very practical to say about it right now.

Here’s an update:

BEGIN TRAN

UPDATE p
    SET p.Score += 1
FROM dbo.Posts AS p
WHERE p.Id = 999;

ROLLBACK

If we run it up to the rollback, it finishes pretty quickly. We are, after all, just updating a single row that we locate via the primary key.

My original idea for the demo was to show some of the odder things you can run into with blocking, so I wrote this query to return a bunch of rows, but get blocked at the very end.

SELECT TOP (100) p.Id, p.Body
FROM dbo.Posts AS p
WHERE p.Id > 900;

Which is exactly what happens. We get to Id 997 and crap out.

SQL Server Query Results
endless

Now if we check on those sessions with WhatsUpLocks, we can see what happened.

SELECT *
FROM dbo.WhatsUpLocks(58) AS wul; --Writer SPID

SELECT *
FROM dbo.WhatsUpLocks(57) AS wul; --Reader SPID
SQL Server Locks
i am stuck

Why is this not weird? Well, comparatively, we take a normal number of overall locks and get blocked in a fairly predictable spot. We get blocked waiting on one of the keys that we need to keep going.

The Weirdness


To backtrack a little bit, part of what I wanted to show was that using order by can sometimes result in “more” blocking. I don’t mean more locks; what I mean is that when we need to order by Score, but we don’t have Score indexed in a useful way, the query will get hung up without showing any rows whatsoever.

SELECT TOP (100) p.Id, p.Body
FROM dbo.Posts AS p
WHERE p.Id > 900
ORDER BY p.Score;

Originally this was a SELECT * query, but I want to show you that it’s specific to the Body column because it’s an NVARCHAR(MAX).

Here’s what comes back from looking at the locks now:

SQL Server Locks
uwot

LOOK HOW MANY LOCKS WE TAKE ON PAGES. That’s bananas.

Watch my video on readers blocking writers for a little background on why this could be troublesome.

If I change my query to not have the Body column in the select list, the locks go back to normal.

SELECT TOP (100) p.Id, p.Score
FROM dbo.Posts AS p
WHERE p.Id > 900
ORDER BY p.Score;
SQL Server Locks
nermal

Of course, sticking Body in the WHERE clause results in an uptick in shared locks taken:

SELECT TOP (100) p.Id, p.Score
FROM dbo.Posts AS p
WHERE p.Id > 900
AND p.Body LIKE N'_%'
ORDER BY p.Score;
SQL Server Locks
that’s nice, dear

But Of Course…


This kind of thing is maybe not the most likely thing you’ll see happening IRL, because you probably have other indexes that queries can use to access data in different ways. For instance, if I have this index on the Posts table, the first query will still get blocked, but all of the other queries will finish instantly.

CREATE INDEX surprise ON dbo.Posts(Score, Id);

Is this another reason to avoid MAX datatypes? Maybe probably.

Don’t forget that they also prevent predicate pushdown, and they can really mess with memory grants.

But hey, I’m sure you have more than enough indexes to fix everything.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How Parameter Sniffing Impacts Locking In SQL Server

Make It Out


I’m going to show you something bizarre. I’m going to show it to you because I care for your well-being and long term mental health.

Someday you’ll run into this and be thoroughly baffled, and I want to be here for you. Waiting, watching, lubricating.

I have a stored procedure. It’s a wonderful stored procedure.

But something funny happens when a parameter gets sniffed.

Wrote A Little Song About It


It’s not the usual parameter sniffing thing, where you get different plans and blah blah blah. That’s dull.

This is even more infuriating. Here’s the part where care about, where we read data to insert into the #temp table.

SQL Server Query Plan
something new

This is the “small” version of the plan. It only moves about 8200 rows.

Now here’s the “big” version of the plan.

SQL Server Query Plan
practice makes

We move way more rows out of the seek (9.8 million), but doesn’t it seem weird that a seek would take 5.6 seconds?

I think so.

Pay special attention here, because both queries aggregate the result down to one row, and the insert to the #temp table is instant both times.

Wanna Hear It?


Let’s do what most good parameter sniffing problem solvers do, and re-run the procedure after recompiling for the “big” value.

SQL Server Query Plan
escapism

It’s the exact same darn plan.

Normally, when you’re dealing with parameter sniffing, and you recompile a procedure, you get a different plan for different values.

Not here though. Yes, it’s faster, but it’s the same operators. Seek, Compute, Stream, Stream, Compute, Insert 1 row.

Important to note here is that the two stream aggregates take around the same about of time as before too.

The real speed up was in the Seek.

How do you make a Seek faster?

YOU NEEK UP ON IT.

Three Days Later


I just woke up from beating myself with a hammer. Sorry about what I wrote before. That wasn’t funny.

But okay, really, what happened? Why is one Seek 4 seconds faster than another seek?

Locking.

All queries do it, and we can prove that’s what’s going on here by adding a locking hint to our select query.

Now, I understand why NOLOCK would set your DBA in-crowd friends off, and how TABLOCK would be an affront to all sense and reason for a select.

So how about a PAGLOCK hint? That’s somewhere in the middle.

SQL Server Query Plan
what we got here

The Seek that took 5.6 seconds is down to 2.2 seconds.

And all this time people told you hints were bad and evil, eh?

YTHO?


It’s pretty simple, once you talk it out.

All queries take locks (even NOLOCK/READ UNCOMMITTED queries).

Lock escalation doesn’t usually happen with them though, because locks don’t accumulate with read queries the way they do with modification queries. They grab on real quick and then let go (except when…).

For the “small” plan, we start taking row locks, and keep taking row locks. The optimizer has informed the storage engine that ain’t much ado about whatnot here, because the estimate (which is correct) is only for 8,190 rows.

That estimate is preserved for the “big” plan that has to go and get a lot more rows. Taking all those additional row locks really slows things down.

No Accumulation, No Escalation.

We stay on taking 9.8 million row locks instead of escalating up to page or object locks.

When we request page locks from the get-go, we incur less overhead.

For the record:

  • PAGLOCK: 2.4 seconds
  • TABLOCK: 2.4 seconds
  • NOLOCK: 2.4 seconds

Nothing seems to go quite as fast as when we start with the “big” parameter, but there’s another reason for that.

When we use the “big” parameter, we get batch mode on the Seek.

SQL Server Query Plan Tool Tip
A FULL SECOND

Welcome to 2019, pal.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.