I am a heading
One Metric Ton Of Indexes
Let’s say you hate your storage. Let’s say you hate it so much that you want you SQL Serve to touch it as little as possible.
You’re most of the people I talk to. Congratulations.
But how do you do that?
Let’s talk about a few things.
How SQL Server Works With Data
It doesn’t matter if a query wants to read or modify data, all those itty-bitty little data pages need to end up in memory.
How much ends up in memory depends on how big your tables are, and how helpful your indexes are.
Likewise, the more indexes you need to modify, the more need to be in memory for that to happen.
You need to design indexes so that you can support your queries by making it easy for them to locate data. That’s your where clause, and guess what?
Your modification queries have where clauses, too.
How You Can Make Indexing Better
Make sure you’re reviewing your indexes regularly. Things that you need to keep an eye on:
- Duplicative indexes
- Under-utilized indexes
Even when indexes are defined on the same columns, they’re separate sets of pages within your data files.
- If you have indexes that are on very similar sets of columns, or supersets/subsets of columns, it’s probably time to start merging them
- If you have indexes that just aren’t being read, or aren’t being read anywhere near as much as they’re written to, you should think about ditching them
Cleaning up indexes like this gives you more breathing room to add in other indexes later.
It also gives you far fewer objects competing for space in memory.
That means the ones you have left stand a better chance of staying there, and your queries not having to go to disk for them.
How You Can Make Indexes Better
There are all sorts of things you can do to make indexes better, too. I don’t mean rebuilding them, either!
I mean getting smarter about what you’re indexing.
Things like filtered indexes and index compression can net you big wins when it comes to reducing the overall size of indexes.
My friend Andy Mallon has some Great Posts™ about compression over on his blog:
- Data Compression: How row compression works
- Data Compression: How page compression works
- Data Compression: Costs & benefits
And of course, computed columns can help if you’ve got a wonky schema.
Smaller indexes that take up less space in memory make more efficient use of the space you have, which means you can fit more in there.
How You Can Make Tables Better
There are some obvious bits here, like being extra careful with choosing string length.
And of course, overly-wide, non-normalized tables can also lead to issues.
If you’re running an OLTP workload, you may also want to make sure that your critical tables aren’t heaps.
Those things tend to take up more space in memory than they need to.
And of course, if you need any help fixing these types of issues, drop me a line!
Thanks for reading!
It’s likely also obvious that your join clauses should also be SARGable. Doing something like this is surely just covering up for some daft data quality issues.
SELECT COUNT_BIG(*) AS records FROM dbo.Users AS u JOIN dbo.Posts AS p ON ISNULL(p.OwnerUserId, 0) = u.Id;
If 0 has any real meaning here, replace the NULLs with zeroes already. Doing it at runtime is a chore for everyone.
But other things can be thought of as “SARGable” too. But perhaps we need a better word for it.
I don’t have one, but let’s define it as the ability for a query to take advantage of index ordering.
World War Three
There are no Search ARGuments here. There’s no argument at all.
But we can plainly see queries invoking functions on columns going all off the rails.
Here’s an index. Please enjoy.
CREATE INDEX c ON dbo.Comments(Score);
Now, let’s write a query. Once well, once poorly. Second verse, same as the first.
SELECT TOP(1) c.* FROM dbo.Comments AS c ORDER BY c.Score DESC; SELECT TOP(1) c.* FROM dbo.Comments AS c ORDER BY ISNULL(c.Score, 0) DESC;
The plan for the first one! Yay!
Look at those goose eggs. Goose Gossage. Nolan Ryan.
The plan for the second one is far less successful.
We’ve done our query a great disservice.
Grouping queries, depending on scope, can also suffer from this. This example isn’t as drastic, but it’s a simple query that still exhibits as decent comparative difference.
SELECT c.Score FROM dbo.Comments AS c GROUP BY c.Score HAVING COUNT_BIG(*) < 0; SELECT ISNULL(c.Score, 0) AS Score FROM dbo.Comments AS c GROUP BY ISNULL(c.Score, 0) HAVING COUNT_BIG(*) < 0;
To get you back to drinking, here’s both plans.
We have, once again, created more work for ourselves. Purely out of vanity.
Put yourself in SQL Server’s place here. Maybe the optimizer, maybe the storage engine. Whatever.
If you had to do this work, how would you prefer to do it? Even though I think ISNULL should have better support, it applies to every other function too.
Would you rather:
- Process data in the order an index presents it and group/order it
- Process data by applying some additional calculation to it and then grouping/ordering
That’s what I thought.
Thanks for reading!
Thanks for watching!
Secret Agent Plan
My dislike for implicit transactions is well documented. Recently, while working with a client, I noticed that they had a bunch of them causing blocking.
Digging in a little bit further, I noticed they were all coming from an Agent Job. Of course, by default, Agent runs with a bunch of wacked-out ANSI options.
To get the job to perform better — which it did — it had to make use of a filtered index on an archival task. If you scroll way down in that doc, you’ll see a note:
Review the required SET options for filtered index creation in CREATE INDEX (Transact-SQL) syntax
In order to create, or have queries use your filtered index, they need to have very specific options set correctly.
Rather than just setting the required options, which was apparently a lot of typing, someone had just set all the ANSI defaults on.
SET ANSI_DEFAULTS ON;
But this comes with some additional baggage, in the form of implicit transactions. If you run
DBCC USEROPTIONS; with that turned on:
Set Option Value ----------------------- -------------- textsize 2147483647 language us_english dateformat mdy datefirst 7 statistics XML SET lock_timeout -1 quoted_identifier SET arithabort SET ansi_null_dflt_on SET ansi_defaults SET ansi_warnings SET ansi_padding SET ansi_nulls SET concat_null_yields_null SET cursor_close_on_commit SET implicit_transactions SET <---- UwU what's this isolation level read committed
It sets all the things you actually need, plus a couple other options for implicit transactions and cursor close on commit.
Of course, had someone just done a bit more typing, all would have been well and good.
SET ANSI_NULLS ON; SET ANSI_PADDING ON; SET ANSI_WARNINGS ON; SET ARITHABORT ON; SET CONCAT_NULL_YIELDS_NULL ON; SET QUOTED_IDENTIFIER ON;
SET ANSI_DEFAULTS OFF;is equally disappointing, sort of.
Set Option Value ----------------------- -------------- textsize 2147483647 language us_english dateformat mdy datefirst 7 lock_timeout -1 arithabort SET concat_null_yields_null SET isolation level read committed
It really does just flip everything off. Not that I’m saying it shouldn’t — but maybe we need a command in between?
SET ANSI_DEFAULTS BACK_TO_NORMAL; or something.
Whatever “normal” means.
Thanks for reading!
No, It’s New
When I was checking out early builds of SQL Server 2019, I noticed a new DMV called dm_db_missing_index_group_stats_query, that I thought was pretty cool.
It helped you tie missing index requests to the queries that requested them. Previously, that took a whole lot of heroic effort, or luck.
With this new DMV, it’s possible to combine queries that look for missing indexes with queries that look for tuning opportunities in the plan cache or in Query Store.
It seems to tie back to dm_db_missing_index_groups, on the index_group_handle column in this DMV joined to the group handle column in the new DMV.
If you’re wondering why I’m not giving you any code samples here, it’s because I’m going to get some stuff built into sp_BlitzIndex to take advantage of it, now that it’s documented.
Thanks for reading!
I talk to a lot of people about performance tuning. It seems like once someone is close enough to a database for long enough, they’ll have some impression of parameter sniffing. Usually a bad one.
You start to hear some funny stuff over and over again:
- We should always recompile
- We should always use local variables
- We should always recompile and use local variables
Often, even if it means writing unsafe dynamic SQL, people will be afraid to parameterize things.
To some degree, I get it. You’re afraid of incurring some new performance problem.
You’ve had the same mediocre performance for years, and you don’t wanna make something worse.
The thing is, you could be making things a lot better most of the time.
- Fewer compiles and recompiles, fewer single-use plans, fewer queries with multiple plans
- Avoiding the local variable nonsense is, more often than not, going to get you better performance
A Letter To You
I’m going to tell you something that you’re not going to like, here.
Most of the time when I see a parameter sniffing problem, I see a lot of other problems.
Shabbily written queries, obvious missing indexes, and a whole list of other things.
It’s not that you have a parameter sniffing problem, you have a general negligence problem.
After all, the bad kind of parameter sniffing means that you’ve got variations of a query plan that don’t perform well on variations of parameters.
Once you start taking care of the basics, you’ll find a whole lot less of the problems that keep you up at night.
If that’s the kind of thing you need help with, drop me a line.
Thanks for reading!
Find Your Data First
Most queries will have a where clause. I’ve seen plenty that don’t. Some of’em have surprised the people who developed them far more than they surprised me.
But let’s start there, because it’s a pretty important factor in how you design your indexes. There are all sorts of things that indexes can help, but the first thing we want indexes to do in general is help us locate data.
Why? Because the easier we can locate data, the easier we can eliminate rows early on in the query plan. I’m not saying we always need to have an index seek, but we generally want to filter out rows we don’t care about when we’re touching the table they’re in.
When we carry excess rows throughout the query plan, all sorts of things get impacted and can become less efficient. This goes hand in hand with cardinality estimation.
At the most severe, rows can’t be filtered when we touch tables, or even join them together, and we have to filter them out later.
When that happens, it’s probably not your indexes that are the problem — it’s you.
You, specifically. You and your awful query.
We can take a page from the missing index request feature here: helping queries find the rows we care about should be a priority.
Sweet N’ Low
When people talk about the order predicates are evaluated in, the easiest way to influence that is with the order of columns in the key of your index.
Since that defines the sort order of the index, if you want a particular column to be evaluated first, put it first in the key of the index.
Selectivity is a decent attribute to consider, but not the end all be all of index design.
Equality predicates preserve ordering of other key columns in the index, which may or may not become important depending on what your query needs to accomplish.
After the where clause, there are some rather uncontroversial things that indexes can help with:
Of course, they help with this because indexes put data in order.
Having rows in a deterministic order makes the above things either much easier (joining and grouping), or free (ordering).
How we decide on key column order necessarily has to take each part of the query involved into account.
If a query is so complicated that creating one index to help it would mean a dozen key columns, you probably need to break things down further.
- Use a #temp table to materialize a less complicated portion
- Get yourself a nonclustered columnstore index to read from
When you’re trying to figure out a good index for one query, you usually want to start with the where clause.
Not always, but it makes sense in most cases because it’s where you can find gains in efficiency.
If your index doesn’t support your where clause, you’re gonna see an index scan and freak out and go in search of your local seppuku parlor.
After that, look to other parts of your query that could help you eliminate rows. Joins are an obvious choice, and typically make good candidates for index key columns.
At this point, your query might be in good enough shape, and you can leave other things alone.
If so, great! You can make the check out to cache. I mean cash.
Thanks for reading!