30 Second Abs
I love helping people solve their problems. That’s probably why I stick around doing what I do instead of opening a gym.
That and I can’t deal with clanking for more than an hour at a time.
Recently I helped a client solve a problem with a broad set of causes, and it was a lot of fun uncovering the right data points to paint a picture of the problem and how to solve it.
Why So Slow?
It all started with an application. At times it would slow down.
No one was sure why, or what was happening when it did.
Over the course of looking at the server together, here’s what we found:
- The server had 32 GB of RAM, and 200 GB of data
- There were a significant number of long locking waits
- There were a significant number of PAGEIOLATCH_** waits
- Indexes had fill factor of 70%
- There were many unused indexes on important tables
- The error log was full of 15 second IO warnings
- There was a network choke point where 10Gb Ethernet went to 1Gb iSCSI
Putting together the evidence
Alone, none of these data points means much. Even the 15 second I/O warnings could just be happening at night, when no one cares.
But when you put them all together, you can see exactly what the problem is.
Server memory wasn’t being used well, both because indexes had a very low fill factor (lots of empty space), and because indexes that don’t help queries had to get maintained by modification queries. That contributed to the PAGEIOLATCH_** waits.
Lots of indexes means more objects to lock, which generally means more locking waits.
Because we couldn’t make good use of memory, and we had to go to disk a lot, the poorly chosen 1Gb iSCSI connections were overwhelmed.
To give you an idea of how bad things were, the 1Gb iSCSI connections were only moving data at around USB 2 speeds.
Putting together the plan
Most of the problems could be solved with two easy changes: getting rid of unused indexes, and raising fill factor to 100. The size of data SQL Server would regularly have to deal with would drop drastically, so we’d be making better use of memory, and be less reliant on disk.
Making those two changes fixed 90% of their problems. Those are great short term wins. There was still some blocking, but user complaints disappeared.
We could make more changes, like adding memory to be even less reliant on disk, and replacing the 1Gb connections, sure. But the more immediate solution didn’t require any downtime, outages, or visits to the data center, and it bought them time to carefully plan those changes out.
If you’re hitting SQL Server problems that you just can’t get a handle on, drop me a line!
Thanks for reading!