I’ve seen a lot of crazy ways for people to run reports on live data, but one of the worst ideas is creating a reporting table based on the set of data a user wants, and then letting them query that data.
- You still have to run the query to populate them, which hits the server with load you’re adverse to
- The buffer pool gets taken up by all these reporting tables, which only one person uses
- It’s hard to index the reporting tables, and column store isn’t useful on Standard Edition
- They get stale quickly, and refreshing them means re-running the original query, or
- A dynamic Merge statement, because no two tables will have the same columns
- Generic reporting tables will often use the wrong data types for columns and be very wide
Put all that together, and you’ve got a pretty bad solution for your server and your users.
I’m gonna say that there are better ways to do this, and I’m gonna be right, but they’re all gonna take work and/or money.
The ones that take money are reporting off a copy of the data, which you can do with Availability Groups, Replication, or Log Shipping. You have to fully license the other server that you’re reading from no matter which you choose.
I have no interest in deep-diving into any one of those here, but I’m gonna generalize a little bit to make sure you’re not totally in the dark:
- Availability Groups are easy to set up, sometimes complicated to manage and patch, and require databases in full recovery model, but give you pretty up-to-date data on the secondary for users to get at.
- Replication is a lot like the above, except it makes me way more nervous. Snapshot Replication gets bonus points for allowing you to create different indexes on the subscriber, which AGs and Log Shipping don’t let you do. The freshness of the data does depend on which type of Replication you choose, so choose wisely.
- Log Shipping just ships transaction log backups, which is pretty dead simple unless you need to deal with servers that can’t directly see each other for whatever reason. Simple is nice, but simple means you’re gonna have to stale data, since you have to kick users out to restore more transaction logs.
The solutions that require work are usually ones that you implement locally. I’m not going to talk about data warehouses here, because it’s a lot like the stuff up there, just with ETL in the middle. Or ELT. Whatever.
Getting reporting working from live data usually takes a few steps, depending on your situation:
- Nonclustered column store indexes if you’re on Enterprise Edition
- Indexed views if you’re on Standard Edition (I’ll talk more about this in a minute)
- Enabling snapshot isolation just for reporting queries to not get blocked
- Tuning the hell out of your report code to not beat the server up too bad
Indexed views have a lot of rules, and if you’re lucky you can get some mileage out of them to speed up aggregations on Standard Edition. You can try using column store, but it’s really severely hobbled on Standard Edition. The DOP two limit is especially a killer in a lot of cases.
All of this is work though, and it at least partially assumes your server can come anywhere near being able to handle the additional workload. If it’s just limping along now, you probably don’t wanna keep throwing straw at Joe Camel.
What I left out a little bit are tables that everyone can report from, because every time I’ve seen them implemented, it’s been an absolute mess of triggers or periodic data synchronizations that left users unhappy.
Either because their transactional workload slowed way down to keep things up to date synchronously, or because the background process that started out running every 10 minutes beat the crap out of the server every 10 minutes, so now it runs every hour but there’s another Agent job to kill that job if it runs more than five minutes because that means something bad is about to happen.
This is why most of the time, I’m happy to do the work to report off the live data.
Thanks for reading!