Data Loading and Referential Integrity

Oh So Helpful


I got a question from a past client recently about how a new data loading tool was side-stepping constraints. After taking a quick look at what was running, it was using BULK INSERT statements to do the load.

There’s nothing wrong that that! I’ve used it quite a bit, and found it very handy to efficiently load data from files. There really is a ton you can do with it.

But with flexibility comes, well, happy accidents.

Check Me Out


To show you what I mean, I’m going to use some quick and dirty examples. This isn’t meant to show you a perfect export process by any means.

I’m going to use BCP to output a small text file, and load it into a table in another database.

EXEC xp_cmdshell 'bcp "SELECT TOP (1000) * FROM StackOverflow2013.dbo.Badges ORDER BY Id;" queryout "c:\temp\Badges.txt" -w -T -S NADABRUTO\SQL2019';

Queue Ball


Let’s look at the table for a minute. There’s not a lot going on with it, but there is a check constraint:

ALTER TABLE Crap.dbo.Badges 
    ADD CONSTRAINT ck_bootypie 
        CHECK(LOWER(Name) NOT IN (N'beavis', N'butthead'));

Yes, it’s a bit of nonsense, but it doesn’t need to be realistic to show you the side effect.

It just has to entertain me for the duration of writing this post while stone cold sober.

Mostly.

See, right now, it’s trusted. Or not not trusted.

SELECT cc.name,
       SCHEMA_NAME(cc.schema_id) AS schema_name,
       OBJECT_NAME(cc.parent_object_id) AS object_name,
       cc.type_desc,
       cc.is_not_trusted
FROM sys.check_constraints AS cc
So solo

Inserts From A Different Room


Let’s take a regular old insert. If we do this, our constraint will still be trusted.

INSERT Crap.dbo.Badges ( Name, UserId, Date )
SELECT TOP (1000) Name, UserId, Date 
FROM StackOverflow2013.dbo.Badges ORDER BY Id;

You’ll have to trust me on that. I mean, would anyone really know if I reused the same picture of the output?

Now here’s an insert with BULK INSERT.

BULK INSERT dbo.Badges
FROM 'c:\temp\Badges.txt'
WITH
    ( DATAFILETYPE = 'widechar',
      BATCHSIZE = 1048576,
      CODEPAGE = 'RAW',
      FIRSTROW = 1,
      FIELDTERMINATOR = '\t',
      ROWTERMINATOR = '\n',
      KEEPNULLS,
      MAXERRORS = 2147483647,
      CHECK_CONSTRAINTS, /*Quote me in, Quote me out*/
      TABLOCK );

If you notice the second to last option there, CHECK_CONSTRAINTS? That’s what SQL Server uses to figure out if it should evaluate constraints when the file imports.

This goes for both foreign keys and check constraints. It does not go for primary key or unique constraints. Those will still throw errors if violated.

If you run your BULK INSERT commands without that included, they’ll be marked as untrusted after you insert data.

But now it comes time to choose your own adventure.

The Optimizing Game


Now that we know the behavior, we can examine our choices.

Option 1: Evaluate Constraints On Load

  • This can slow down data loads
  • May cause additional blocking
  • You won’t have any work to do later
  • But… your loads may fail.
  • Then you’ll have to go look in text files to figure out why

Option 2: Load it all, deal with issues later

  • Data loads will be quicker
  • You’ll likely see less blocking
  • But you’ll have to try to re-trust your check constraints afterwards
  • If you’ve got large tables, or no good indexes to support constraints, this can be awful
  • Trusting constraints may fail, but figuring out which rows are broken is probably easier

If we want to re-trust our constraint, we’ll need to run a command like this:

ALTER TABLE Crap.dbo.Badges 
    WITH CHECK CHECK CONSTRAINT ck_bootypie;

If you wanted to build dynamic commands to re-trust them across a bunch of tables, you’d need something like this:

SELECT N'ALTER TABLE ' 
       + QUOTENAME(s.name) 
       + N'.' 
       + QUOTENAME(o.name) 
       + N' WITH CHECK CHECK CONSTRAINT '
       + QUOTENAME(f.name) AS utrusted_fk
FROM sys.foreign_keys AS f
INNER JOIN sys.objects AS o
    ON f.parent_object_id = o.object_id
INNER JOIN sys.schemas AS s
    ON o.schema_id = s.schema_id
WHERE f.is_not_trusted = 1;


SELECT N'ALTER TABLE ' 
       + QUOTENAME(s.name) 
       + N'.' 
       + QUOTENAME(o.name) 
       + N' WITH CHECK CHECK CONSTRAINT '
       + QUOTENAME(c.name) AS utrusted_ck
FROM sys.check_constraints AS c
INNER JOIN sys.objects AS o
    ON c.parent_object_id = o.object_id
INNER JOIN sys.schemas AS s
    ON o.schema_id = s.schema_id
WHERE c.is_not_trusted = 1;

Go ahead and give those a run. You might be surprised what you find.

Thanks for reading!



Leave a Reply

Your email address will not be published. Required fields are marked *