Batching Modification Queries Is Great Until…

Scarecrow

I don’t mean to put anyone off from using batching to make large modifications more efficient.

In this case, more efficient doesn’t necessarily mean faster. You still have this complete unit of work to get through.

The whole point is to go a little bit easier on your server:

Smaller transactions sent to the log file
Shorter locking durations
Less build up of the version store (optimistic isolation levels, triggers, etc.)

But there are thing that make batching impossible, or downright tricky.

Atom Bomb

Let’s say you need to update 10 million rows, and you’re using batches of 5000 rows.

What happens when one batch throws an error?

If you were running the whole thing as a single query, the whole thing would roll back. Depending on how far along you’ve gotten, that could be pretty painful (especially for you suckers that don’t have Accelerated Database Recovery enabled).

That also kind of sucks… depending on the error, too. Not all of them mean the update violated some database rule you’ve set up via constraints.

But all that aside, there are some circumstances where maybe the entire thing should fail and roll back.

In those cases, you’ll need something that keeps track of the “before” rows. There are ways to do this within SQL Server that don’t require you to program anything:

Change Data Capture
Change Tracking
Temporal Tables

If you can afford to use those, it might be a pretty “easy” way of tracking changes to your data so that you can roll it back.

But it’s up to you to write code that uses any of those things to find the old values and do the ol’ natty switcheroo.

Batchy Bomb

It is possible to save off all the rows you’ve change to another table, and then reverse the changes.

The table load and setup script is here. To use them, you can do something like this.

Then you can use Michael’s batching code, with an OUTPUT clause to your “save” table.

/*Batch update with output*/
-- From https://michaeljswart.com/2014/09/take-care-when-scripting-batches/
DECLARE
    @LargestKeyProcessed int = -1,
    @NextBatchMax int,
    @RC int = 1;
 
WHILE (@RC > 0)
BEGIN
    /*Starting place*/
    SELECT
        @NextBatchMax  = 
            MAX(x.id)
    FROM
    (
        SELECT TOP (1000) 
            aia.id
        FROM dbo.as_i_am AS aia
        WHERE aia.id >= @LargestKeyProcessed
        ORDER BY 
            aia.id ASC
    ) AS x;
    
    /*Updateroo*/
    UPDATE
        aia
        SET aia.some_date = 
                DATEADD(YEAR, 1, aia.some_date),
            aia.some_string = 
                aia.some_string + LEFT(aia.some_string, 1)
        OUTPUT 
            Deleted.id, Deleted.some_date, Deleted.some_string
        INTO
            dbo.as_i_was (id, some_date, some_string)
    FROM dbo.as_i_am AS aia
    WHERE aia.id >= @LargestKeyProcessed
    AND   aia.id <= @NextBatchMax;
    
    /*Enhance*/
    SET @RC = @@ROWCOUNT;
    SET @LargestKeyProcessed = @NextBatchMax;
END;

The only thing I’m doing different here, aside from my own tables, is using OUTPUT to dump the prior row versions into a new table.

That way, if a batch fails, I can roll things back.

Witch Errors

One thing that you should figure out is which errors you want to guard against. Lock and deadlock errors are common ones.

You can do something like this:

DECLARE
    @do_over tinyint = 0,
    @game_over tinyint = 5;
WHILE
    @do_over <= @game_over
BEGIN
    BEGIN TRY        
        SET NOCOUNT, XACT_ABORT ON;        
        UPDATE dbo.as_i_am SET some_string = REPLICATE('ack', 1) WHERE id = 138; /*do a thing*/          
        BREAK; /*exit loop if thing is successful*/       
    END TRY 
    BEGIN CATCH      
          IF ERROR_NUMBER() IN (1204, 1205, 1222) /*lock and deadlock errors*/
          BEGIN             
              SELECT
                  @do_over += 1;              
              WAITFOR DELAY '00:00:10';            
          END; 
          ELSE              
          BEGIN;
              /*log some details to a table here*/
              THROW;           
          END;    
    END CATCH;
END;

This is somewhat incomplete pseudo code, but it looks good enough to blog. Don’t judge me too harshly, just use it to get where you need to go.

Other errors, like primary key, foreign key, constraint, string truncation, and others that indicate data quality issues shouldn’t be retried.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

5 thoughts on “Batching Modification Queries Is Great Until…”

Henrik Staun Poulsen says:

October 10, 2022 at 10:52 am

May I repeat the words of Erland Sommarskog?

Set xact_abort, nocount on
…

If @@trancount > 1 rollback

And not to forget;

;Throw
Not
throw;

Agree?
1. Erik Darling says:
  
  October 10, 2022 at 1:05 pm
  
  Why are you asking me this?
Pingback: When Batching Breaks Down – Curated SQL
Stephen Morris says:

October 13, 2022 at 8:11 am

My approach is that this sort of operation needs the db in single user mode & all application users shut out. Then start with a database snapshot – if halfway through there is an error you can restore from the snapshot. You wouldn’t want a user to grab an exclusive lock on a row on the table in question anyway.
1. Erik Darling says:
  
  October 13, 2022 at 11:55 am
  
  Dude what no

Comments are closed.