Let’s Say We Have A Bad Query
I know you don’t have bad queries. You’re smart people. But some people do!
Let’s also say that bad query is taking part in a modification.
UPDATE u2 SET u2.Reputation *= 2 FROM Users AS u JOIN dbo.Users AS u2 ON CHARINDEX(u.DisplayName, u2.DisplayName) > 0 WHERE u2.Reputation >= 100000 AND u.Id <> u2.Id;
This query will run for so long that we’ll get sick of waiting for it. It’s really holding up writing this blog post.
Let’s try something a little different.
SELECT u.Id, u.DisplayName INTO #Users FROM dbo.Users AS u WHERE u.Reputation >= 100000; UPDATE u2 SET u2.Reputation *= 2 FROM #Users AS u JOIN dbo.Users AS u2 ON CHARINDEX(u.DisplayName, u2.DisplayName) > 0 AND u.Id <> u2.Id;
This time, the query finishes in 43 seconds.
Plans and Events
The query plan looks like this:
Extended events can tell us when locks were taken and released to perform the update.
An exclusive lock was held on the table for over 35 seconds, so all but ~7 seconds of the plan has locks on the table we’re updating.
In real life, that’d probably mean a lot of blocking.
Another way to handle this is to remove the long-running query from the update, and take advantage of the Primary Key/Clustered Index on the table.
First, isolate the rows we care about:
SELECT u2.Id, u2.Reputation * 2 AS DoubleRep INTO #Updater FROM #Users AS u JOIN dbo.Users AS u2 ON CHARINDEX(u.DisplayName, u2.DisplayName) > 0 AND u.Id <> u2.Id;
Then do the update:
UPDATE u2 SET u2.Reputation = u.DoubleRep FROM #Updater AS u JOIN dbo.Users AS u2 ON u.Id = u2.Id;
This query only runs a few hundred milliseconds, so the locks taken are shorter.
Keep in mind, though, that the first insert query is still horrible.
It runs for ~40ish seconds, but at least we don’t lock any objects that other queries might care about.
Thanks for reading!