I often see people get stuck hard by this. Even worse, it happens when they’re using a merge statement, which are like throwing SQL Server a knuckleball.
It has no idea what it might have to do with your merge — insert? update? delete? — so it has to prepare a plan for any of them that you specify.
Just don’t use merge, okay? If you take one thing from this whole series: please don’t use merge.
Okay, anyway, back to the point: large modifications can suck in a few different ways.
The whole time those big modifications are running, other queries are gonna get blocked. Even with NOLOCK/UNCOMMITTED hints, other modification queries can get stuck behind them. Wanna make users feel some pain? Have your app be unusable for big chunks of time because you refuse to chunk your modifications. Worse, if enough queries get backed up behind one of these monsters, you can end up running out of worker threads, which is an even worse performance issue.
The more records you need to change, the more transaction logging you have to do. Even in simple recovery, you log the same amount of changes here (unless your insert gets minimal logging). A lot of people think simple recovery means less logging, but no, it just means that SQL Server manages the transaction for you. This’ll get worse as you add more indexes to the table, because change for each of them are logged separately.
The modification part of any update or delete happens single-threaded. Other parts of the query plan might go parallel, but the actual modification portion can’t. Getting a few million rows ready on a bunch of threads simultaneously might be fast, but then actually doing the modification can be pretty slow. You have to gather all those threads down to a single one.
It goes without saying that large modifications will want object-level locks. If there are incompatible locks, they may end up blocked. If they started by taking row or page locks, and tried to escalate to an object level lock but couldn’t, you could end up gobbling up a whole lot of your lock memory, which is a finite resource. Remember, there’s no escalation or transition between row and page locks. This is another place where having a lot of indexes hanging around can hurt.
Buffer Pool Pollution:
If you’re the type of person who isn’t regularly declutter your indexes, it’s likely that you have a bunch of indexes that either don’t get used anymore, only rarely get used, or are duplicative of other indexes defined on a table. Just like with transaction logging an lock escalation, the more indexes you have around, the more of them you need to read up into SQL Server’s buffer pool to modify them. SQL Server doesn’t work with pages on disk.
How fast these queries run will be partially be dictated by:
- How much memory you have
- How fast your disks are
- How fast your storage networking is
There are other factors too, like the type of data you’re changing. Changing MAX data types has way more overhead than more reasonable ones, or even shorter strings.
Thanks for reading!
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.