No Way, No How
This is a list of things I see in data warehouses that make me physically ill:
- Unique constraints of any kind: Primary Keys, Indexes, etc. Make things unique during your staging process. Don’t make your indexes do that work.
- Foreign Keys: Referential integrity should be guaranteed from your data source. If it can’t be, there’s no sense in making it happen in your data warehouse. Foreign Keys in SQL Server suck anyway, and slow the hell out of large data loads.
- Clustered row store indexes: At this point in time, when you need a clustered index, it ought to be a clustered column store index.
- Tables with “lots” of nonclustered row store indexes: They’ll only slow down your load times a whole bunch. Replace them with nonclustered column store indexes.
- Indexed views: This isn’t 2012 anymore. Column store, column store, column store.
- Standard Edition: The CPU limit of 24 cores is probably fine, but the buffer pool cap of 128GB and strict limitations on column store/batch mode are horrendous.
I know what you’re thinking looking at this list: I can drop and re-create things like unique constraints, foreign keys, and nonclustered indexes. You sure can, but you’re wasting a ton of time.
Data warehouses have a completely different set of needs from transaction systems. The sooner you stop treating data warehouses like AdventureWorks, the better.
Thanks for reading!
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.