Let’s say you’ve either got a heap, like we talked about yesterday, or you’re plotting a new table.
It’s OLTP, and you want the very best for your transactions. What are some good characteristics of a clustered index?
Here’s an easy to remember acronym!
- Narrow: Numbers, dates, not string-ish
- Unique: Identities or sequences are good for this
- Distinct: If not unique, then as distinct as possible
- Ever: Increasing (append only is the goal, here
- Static: Definitely not a column you ever update
Follow that, and you’ll most likely never go wrong. Unless you’ve got GUIDs involved, maybe. But at least you’re trying.
Why do we care about those things?
All of those traits help us avoid problems down the line. like shuffling pages in the middle around, or with lopsided data.
The narrow thing is important because, like I’ve mentioned, nonclustered indexes inherit those clustered index key columns. If you assign wide clustered indexes, either via the number of key columns, or via the data types of those key columns, you can end up with quite bloated nonclustered indexes.
The whole thing can be unpleasant.
The important thing to remember is that clustered indexes are a nearly free way to logically order your table data, and have an index that fully covers all of the columns. This is probably a good time to remind you that clustered indexes are not a copy of the table, but the table data itself ordered logically by the key columns.
To BIGINT Or Not To BIGINT?
If you’re going to do what most people do and make your clustered index an identity, do yourself a favor and use a BIGINT.
Some people will hem and haw about the 4 extra bytes, but those 4 extra bytes don’t make too much difference until the table gets to be around 2 billion rows. At that point, the pain of altering an INT to a BIGINT can be pretty severe, with some careful planning.
I’ve seen enough people go through heck with it to tell you to avoid the problem from the get go.
But I Need So Many GUIDs
Okay, cool. I understand. GUIDs are hard to run out of. They can be rough if not generated sequentially, and even then… Well, at least they’re unique. Your nonclustered indexes won’t suffer *too* much by inheriting them.
But this is generally where a different design pattern makes sense:
- Put a nonclustered primary key on your GUID
- Add a clustered index to a more sensible column
But hey, businesses seem to get angry when they find out identities can have gaps in values. GUIDs totally don’t have that problem 😃
Are Multi-Keys Okay?
I guess, if you need a couple columns to make a unique combination, or if you have a couple columns you’re always searching on.
Just make sure you really try to keep them narrow. Strings usually aren’t your friend here.
But you know, once you’ve got a clustered index in place, it just kinda sits there.
The real fun comes in figuring out nonclustered indexes
- How many can I have?
- How many columns can I put in them?
Let’s talk about those a little next.
Thanks for reading!
For the entire month of August, you can get 75% off my recorded video training with the coupon code “cruelsummer”