Software Vendor Mistakes With SQL Server: Using Integers Instead Of Big Integers For Identity Columns

Schemathematics


I’ve had far too many clients get stuck running out of identity values for a table that grew a lot faster than they thought it would. The change is an ugly one for many reasons.

Though you can setup a process to make the change easier, and some changes are available as metadata-only, most people either have way more complications involved (foreign keys, triggers, etc.) or aren’t practically set up to use the metadata-only solution by having compressed indexes, and all the whatnot outlined in the linked post.

I was hoping that vNext would take care of the change being painful, but so far I haven’t heard of anything.

The integer maximum is, of course, 2,147,483,647 (2147483647). The big integer maximum is, of course 9,223,372,036,854,775,807 (9223372036854775807). The big integer maximum is around 4294967298 times larger. That gives you a lot more runway before you run out.

Of course, it comes with a trade off: you’re storing 8 bytes instead of 4. But my favorite way of explaining why that’s worth it is this: by the time you notice that 4 extra bytes of storage space, you’re probably about to run out of integers anyway.

Masters Of My Domain Knowledge


You don’t need to do this for static lists, or for things with an otherwise limited population. For example, if you were going to make a table of every one in your country, you could still use an integer. Even in the most populous countries on earth, you could probably survive a while with an integer.

The problem comes when you start tracking many to one relations.

An easy thing to imagine is transactions, where each user will likely have many credits and debits. Or if you’re more keen on the Stack Overflow database, each user will ask many questions and post many answers.

Hopefully, anyway. In reality, most users ask one terrible question and never come back, even if their terrible questions gets a really good answer.

The point is that once enough users have some degree of frequent activity, that identity column/sequence object will start racking up some pretty high scores. Forget the last page contention issues, there are much easier ways of dealing with those. Your problem is hitting that integer wall.

Aside from using a big integer, you could fiddle with resetting the identity or sequence value to the negative max value, but that makes people queasy for an entirely different set of reasons.

Wizzed’em


Any table in your database that’s set to represent individual actions by your users should use a big integer as an identity value, assuming you’re going the surrogate key route that utilizes an identity column, or a column based on a sequence object.

If you use a regular integer, you’re asking for problems later. Choose the form of your destructor:

  • Recycling identity values
  • Changing to a big integer
  • Deforming your table to use a GUID
  • Let’s call the whole thing off

It’s not easy making data layer changes once things have grown to the point where you’re starting to hit hard limits and boundaries. Anything you can do to protect yourself from the get-go is a must.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.