Why Reporting Tables Aren’t A Great Idea

Headway


I’ve seen a lot of crazy ways for people to run reports on live data, but one of the worst ideas is creating a reporting table based on the set of data a user wants, and then letting them query that data.

Here’s why:

  • You still have to run the query to populate them, which hits the server with load you’re adverse to
  • The buffer pool gets taken up by all these reporting tables, which only one person uses
  • It’s hard to index the reporting tables, and column store isn’t useful on Standard Edition
  • They get stale quickly, and refreshing them means re-running the original query, or
  • A dynamic Merge statement, because no two tables will have the same columns
  • Generic reporting tables will often use the wrong data types for columns and be very wide

Put all that together, and you’ve got a pretty bad solution for your server and your users.

Money Talks


I’m gonna say that there are better ways to do this, and I’m gonna be right, but they’re all gonna take work and/or money.

The ones that take money are reporting off a copy of the data, which you can do with Availability Groups, Replication, or Log Shipping. You have to fully license the other server that you’re reading from no matter which you choose.

I have no interest in deep-diving into any one of those here, but I’m gonna generalize a little bit to make sure you’re not totally in the dark:

  • Availability Groups are easy to set up, sometimes complicated to manage and patch, and require databases in full recovery model, but give you pretty up-to-date data on the secondary for users to get at.
  • Replication is a lot like the above, except it makes me way more nervous. Snapshot Replication gets bonus points for allowing you to create different indexes on the subscriber, which AGs and Log Shipping don’t let you do. The freshness of the data does depend on which type of Replication you choose, so choose wisely.
  • Log Shipping just ships transaction log backups, which is pretty dead simple unless you need to deal with servers that can’t directly see each other for whatever reason. Simple is nice, but simple means you’re gonna have to stale data, since you have to kick users out to restore more transaction logs.

Manifesto


The solutions that require work are usually ones that you implement locally. I’m not going to talk about data warehouses here, because it’s a lot like the stuff up there, just with ETL in the middle. Or ELT. Whatever.

Getting reporting working from live data usually takes a few steps, depending on your situation:

  • Nonclustered column store indexes if you’re on Enterprise Edition
  • Indexed views if you’re on Standard Edition (I’ll talk more about this in a minute)
  • Enabling snapshot isolation just for reporting queries to not get blocked
  • Tuning the hell out of your report code to not beat the server up too bad

Indexed views have a lot of rules, and if you’re lucky you can get some mileage out of them to speed up aggregations on Standard Edition. You can try using column store, but it’s really severely hobbled on Standard Edition. The DOP two limit is especially a killer in a lot of cases.

All of this is work though, and it at least partially assumes your server can come anywhere near being able to handle the additional workload. If it’s just limping along now, you probably don’t wanna keep throwing straw at Joe Camel.

Out There


What I left out a little bit are tables that¬†everyone can report from, because every time I’ve seen them implemented, it’s been an absolute mess of triggers or periodic data synchronizations that left users unhappy.

Either because their transactional workload slowed way down to keep things up to date synchronously, or because the background process that started out running every 10 minutes beat the crap out of the server every 10 minutes, so now it runs every hour but there’s another Agent job to kill that job if it runs more than five minutes because that means something bad is about to happen.

This is why most of the time, I’m happy to do the work to report off the live data.

Thanks for reading!



4 thoughts on “Why Reporting Tables Aren’t A Great Idea

  1. Over a decade ago, I worked for a company that didn’t believe in spending money on IT. So, a single standard edition SQL 2008 R2 instance with log shipping to an even more underpowered remote instance for DR. I couldn’t get enough RAM for the main instance, they sure weren’t going to pay for an AG. They had very complex reporting stored procedures with many users that took minutes to run, but required the reports have data within 15 minutes of current and I couldn’t slow down insert/update/delete transactions. They couldn’t tolerate getting kicked out of that DR instance every 15 minutes when the log shipping took place. Indexed views weren’t an option due to the complexity of the reporting stored procedures (lots of aggregations and temp tables were involved in them).

    The only solution that met the requirements was using an Agent Job for each report that ran a stored procedures in the main instance every 15 minutes that created a reporting table with a temporary name that contained the report data and was a copy of the “real” table (built dynamically from the “real” table so index changes and the like didn’t cause maintenance for this process), inserted the results of the reporting procs into the new table, then used a partition switch (https://sqlrus.com/2018/03/using-switch-on-a-single-partition/) to swap the table with the temporary name for the old one, and then deleted the table with the temporary names (which was then the old one).

    For report users, there were only a few milliseconds every 15 minutes when the reports were unavailable, the reports loaded instantly, the load on the system was manageable, and there was no impact on transaction processing (we were using RCSI). It certainly wasn’t the best solution, but it was the best I could implement quickly for free. It worked well and never required any maintenance.

    1. That is an awesome solution and encompasses what I love about programming, unlimited creativity.
      If you get time to write, I’d love to read more about this.

Leave a Reply

Your email address will not be published. Required fields are marked *