Starting SQL: Picking Clustered Indexes For Tables

Need’em Got’em

Let’s say you’ve either got a heap, like we talked about yesterday, or you’re plotting a new table.

It’s OLTP, and you want the very best for your transactions. What are some good characteristics of a clustered index?

Here’s an easy to remember acronym!

Narrow: Numbers, dates, not string-ish
Unique: Identities or sequences are good for this
Distinct: If not unique, then as distinct as possible
Ever: Increasing (append only is the goal, here
Static: Definitely not a column you ever update

Follow that, and you’ll most likely never go wrong. Unless you’ve got GUIDs involved, maybe. But at least you’re trying.

Why do we care about those things?

Avoiding Problems

All of those traits help us avoid problems down the line. like shuffling pages in the middle around, or with lopsided data.

The narrow thing is important because, like I’ve mentioned, nonclustered indexes inherit those clustered index key columns. If you assign wide clustered indexes, either via the number of key columns, or via the data types of those key columns, you can end up with quite bloated nonclustered indexes.

The whole thing can be unpleasant.

The important thing to remember is that clustered indexes are a nearly free way to logically order your table data, and have an index that fully covers all of the columns. This is probably a good time to remind you that clustered indexes are not a copy of the table, but the table data itself ordered logically by the key columns.

The Questions

To BIGINT Or Not To BIGINT?

If you’re going to do what most people do and make your clustered index an identity, do yourself a favor and use a BIGINT.

Some people will hem and haw about the 4 extra bytes, but those 4 extra bytes don’t make too much difference until the table gets to be around 2 billion rows. At that point, the pain of altering an INT to a BIGINT can be pretty severe, with some careful planning.

I’ve seen enough people go through heck with it to tell you to avoid the problem from the get go.

But I Need So Many GUIDs

Okay, cool. I understand. GUIDs are hard to run out of. They can be rough if not generated sequentially, and even then… Well, at least they’re unique. Your nonclustered indexes won’t suffer *too* much by inheriting them.

But this is generally where a different design pattern makes sense:

Put a nonclustered primary key on your GUID
Add a clustered index to a more sensible column

But hey, businesses seem to get angry when they find out identities can have gaps in values. GUIDs totally don’t have that problem ?

Are Multi-Keys Okay?

I guess, if you need a couple columns to make a unique combination, or if you have a couple columns you’re always searching on.

Just make sure you really try to keep them narrow. Strings usually aren’t your friend here.

Action Indexes

But you know, once you’ve got a clustered index in place, it just kinda sits there.

The real fun comes in figuring out nonclustered indexes

How many can I have?
How many columns can I put in them?

Let’s talk about those a little next.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

2 thoughts on “Starting SQL: Picking Clustered Indexes For Tables”

SR says:

February 24, 2022 at 7:52 am

Hello, I love your columns, and your conversation style of writing with all the obscure quote-references, which is like my style (and there are so few of us!)
I struggle to understand how a Clustered Index can be: “the table data itself ordered logically by the key columns” unless it was based on a strictly ascending Identity column of some kind with no gaps. What if it was based on an ID that was basically just a randomly assigned number? The table would have to be rewritten every time one was added. Or… it is some kind of Heap structure, and not actually written in literal sequence “on the disk” (as if those exist). Yes? Thanks!
1. Erik Darling says:
  
  March 2, 2022 at 9:37 am
  
  Thanks for the kind words!
  
  Logical ordering isn’t the same as physical ordering. Double-linked lists and slot arrays, etc. mean pages and values don’t have to be physically ordered at write-time, either on disk or in-memory.

Comments are closed.