Tuning I/O Is Often About Tuning Indexes

One Metric Ton Of Indexes


Let’s say you hate your storage. Let’s say you hate it so much that you want you SQL Serve to touch it as little as possible.

You’re most of the people I talk to. Congratulations.

But how do you do that?

Let’s talk about a few things.

How SQL Server Works With Data


It doesn’t matter if a query wants to read or modify data, all those itty-bitty little data pages need to end up in memory.

How much ends up in memory depends on how big your tables are, and how helpful your indexes are.

Likewise, the more indexes you need to modify, the more need to be in memory for that to happen.

You need to design indexes so that you can support your queries by making it easy for them to locate data. That’s your where clause, and guess what?

Your modification queries have where clauses, too.

How You Can Make Indexing Better


Make sure you’re reviewing your indexes regularly. Things that you need to keep an eye on:

  • Duplicative indexes
  • Under-utilized indexes

Even when indexes are defined on the same columns, they’re separate sets of pages within your data files.

  • If you have indexes that are on very similar sets of columns, or supersets/subsets of columns, it’s probably time to start merging them
  • If you have indexes that just aren’t being read, or aren’t being read anywhere near as much as they’re written to, you should think about ditching them

Cleaning up indexes like this gives you more breathing room to add in other indexes later.

It also gives you far fewer objects competing for space in memory.

That means the ones you have left stand a better chance of staying there, and your queries not having to go to disk for them.

How You Can Make Indexes Better


There are all sorts of things you can do to make indexes better, too. I don’t mean rebuilding them, either!

I mean getting smarter about what you’re indexing.

Things like filtered indexes and index compression can net you big wins when it comes to reducing the overall size of indexes.

My friend Andy Mallon has some Great Posts™ about compression over on his blog:

And of course, computed columns can help if you’ve got a wonky schema.

Smaller indexes that take up less space in memory make more efficient use of the space you have, which means you can fit more in there.

How You Can Make Tables Better


There are some obvious bits here, like being extra careful with choosing string length.

LOB data can lead to weird locking, and mess with memory grants.

And of course, overly-wide, non-normalized tables can also lead to issues.

If you’re running an OLTP workload, you may also want to make sure that your critical tables aren’t heaps.

Those things tend to take up more space in memory than they need to.

And of course, if you need any help fixing these types of issues, drop me a line!

Thanks for reading!

SARGable Isn’t Just For Your Where Clause

Maybe Probably


It’s likely also obvious that your join clauses should also be SARGable. Doing something like this is surely just covering up for some daft data quality issues.

SELECT
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON ISNULL(p.OwnerUserId, 0) = u.Id;

If 0 has any real meaning here, replace the NULLs with zeroes already. Doing it at runtime is a chore for everyone.

But other things can be thought of as “SARGable” too. But perhaps we need a better word for it.

I don’t have one, but let’s define it as the ability for a query to take advantage of index ordering.

World War Three


There are no Search ARGuments here. There’s no argument at all.

But we can plainly see queries invoking functions on columns going all off the rails.

Here’s an index. Please enjoy.

CREATE INDEX c ON dbo.Comments(Score);

Now, let’s write a query. Once well, once poorly. Second verse, same as the first.

SELECT TOP(1)
    c.*
FROM dbo.Comments AS c
ORDER BY 
    c.Score DESC;

SELECT TOP(1)
    c.*
FROM dbo.Comments AS c
ORDER BY 
    ISNULL(c.Score, 0) DESC;

The plan for the first one! Yay!

inky

Look at those goose eggs. Goose Gossage. Nolan Ryan.

The plan for the second one is far less successful.

trashy vampire

We’ve done our query a great disservice.

Not Okay


Grouping queries, depending on scope, can also suffer from this. This example isn’t as drastic, but it’s a simple query that still exhibits as decent comparative difference.

SELECT 
    c.Score
FROM dbo.Comments AS c
GROUP BY 
    c.Score
HAVING 
    COUNT_BIG(*) < 0;

SELECT 
    ISNULL(c.Score, 0) AS Score
FROM dbo.Comments AS c
GROUP BY 
    ISNULL(c.Score, 0)
HAVING 
    COUNT_BIG(*) < 0;

To get you back to drinking, here’s both plans.

the opposite of fur

We have, once again, created more work for ourselves. Purely out of vanity.

Indexable


Put yourself in SQL Server’s place here. Maybe the optimizer, maybe the storage engine. Whatever.

If you had to do this work, how would you prefer to do it? Even though I think ISNULL should have better support, it applies to every other function too.

Would you rather:

  • Process data in the order an index presents it and group/order it
  • Process data by applying some additional calculation to it and then grouping/ordering

That’s what I thought.

Thanks for reading!

Be Careful With SET ANSI_DEFAULTS ON

Secret Agent Plan


My dislike for implicit transactions is well documented. Recently, while working with a client, I noticed that they had a bunch of them causing blocking.

Digging in a little bit further, I noticed they were all coming from an Agent Job. Of course, by default, Agent runs with a bunch of wacked-out ANSI options.

To get the job to perform better — which it did — it had to make use of a filtered index on an archival task. If you scroll way down in that doc, you’ll see a note:

Review the required SET options for filtered index creation in CREATE INDEX (Transact-SQL) syntax

In order to create, or have queries use your filtered index, they need to have very specific options set correctly.

Baggage


Rather than just setting the required options, which was apparently a lot of typing, someone had just set all the ANSI defaults on.

SET ANSI_DEFAULTS ON;

But this comes with some additional baggage, in the form of implicit transactions. If you run DBCC USEROPTIONS; with that turned on:

Set Option              Value
----------------------- --------------
textsize                2147483647
language                us_english
dateformat              mdy
datefirst               7
statistics XML          SET
lock_timeout            -1
quoted_identifier       SET
arithabort              SET
ansi_null_dflt_on       SET
ansi_defaults           SET
ansi_warnings           SET
ansi_padding            SET
ansi_nulls              SET
concat_null_yields_null SET
cursor_close_on_commit  SET
implicit_transactions   SET <---- UwU what's this
isolation level         read committed

It sets all the things you actually need, plus a couple other options for implicit transactions and cursor close on commit.

Baggage


Of course, had someone just done a bit more typing, all would have been well and good.

SET ANSI_NULLS ON;
SET ANSI_PADDING ON;
SET ANSI_WARNINGS ON;
SET ARITHABORT ON;
SET CONCAT_NULL_YIELDS_NULL ON;
SET QUOTED_IDENTIFIER ON;

Using SET ANSI_DEFAULTS OFF;is equally disappointing, sort of.

Set Option              Value
----------------------- --------------
textsize                2147483647
language                us_english
dateformat              mdy
datefirst               7
lock_timeout            -1
arithabort              SET
concat_null_yields_null SET
isolation level         read committed

It really does just flip everything off. Not that I’m saying it shouldn’t — but maybe we need a command in between?

SET ANSI_DEFAULTS BACK_TO_NORMAL; or something.

Whatever “normal” means.

Thanks for reading!

Documentation for dm_db_missing_index_group_stats_query

No, It’s New


When I was checking out early builds of SQL Server 2019, I noticed a new DMV called dm_db_missing_index_group_stats_query, that I thought was pretty cool.

It helped you tie missing index requests to the queries that requested them. Previously, that took a whole lot of heroic effort, or luck.

With this new DMV, it’s possible to combine queries that look for missing indexes with queries that look for tuning opportunities in the plan cache or in Query Store.

It seems to tie back to dm_db_missing_index_groups, on the index_group_handle column in this DMV joined to the group handle column in the new DMV.

If you’re wondering why I’m not giving you any code samples here, it’s because I’m going to get some stuff built into sp_BlitzIndex to take advantage of it, now that it’s documented.

Special thanks to William Assaf (b|t) for helping to get this done.

Thanks for reading!

Parameter Sniffing Is Usually A Good Thing

Tick Tock


I talk to a lot of people about performance tuning. It seems like once someone is close enough to a database for long enough, they’ll have some impression of parameter sniffing. Usually a bad one.

You start to hear some funny stuff over and over again:

  • We should always recompile
  • We should always use local variables
  • We should always recompile and use local variables

Often, even if it means writing unsafe dynamic SQL, people will be afraid to parameterize things.

Between Friends


To some degree, I get it. You’re afraid of incurring some new performance problem.

You’ve had the same mediocre performance for years, and you don’t wanna make something worse.

The thing is, you could be making things a lot better most of the time.

  • Fewer compiles and recompiles, fewer single-use plans, fewer queries with multiple plans
  • Avoiding the local variable nonsense is, more often than not, going to get you better performance

A Letter To You


I’m going to tell you something that you’re not going to like, here.

Most of the time when I see a parameter sniffing problem, I see a lot of other problems.

Shabbily written queries, obvious missing indexes, and a whole list of other things.

It’s not that you have a parameter sniffing problem, you have a general negligence problem.

After all, the bad kind of parameter sniffing means that you’ve got variations of a query plan that don’t perform well on variations of parameters.

Once you start taking care of the basics, you’ll find a whole lot less of the problems that keep you up at night.

If that’s the kind of thing you need help with, drop me a line.

Thanks for reading!

A General Indexing Strategy For Normal Queries

Find Your Data First


Most queries will have a where clause. I’ve seen plenty that don’t. Some of’em have surprised the people who developed them far more than they surprised me.

But let’s start there, because it’s a pretty important factor in how you design your indexes. There are all sorts of things that indexes can help, but the first thing we want indexes to do in general is help us locate data.

Why? Because the easier we can locate data, the easier we can eliminate rows early on in the query plan. I’m not saying we always need to have an index seek, but we generally want to filter out rows we don’t care about when we’re touching the table they’re in.

Burdens


When we carry excess rows throughout the query plan, all sorts of things get impacted and can become less efficient. This goes hand in hand with cardinality estimation.

At the most severe, rows can’t be filtered when we touch tables, or even join them together, and we have to filter them out later.

I wrote about that here and here.

When that happens, it’s probably not your indexes that are the problem — it’s you.

You, specifically. You and your awful query.

We can take a page from the missing index request feature here: helping queries find the rows we care about should be a priority.

Sweet N’ Low


When people talk about the order predicates are evaluated in, the easiest way to influence that is with the order of columns in the key of your index.

Since that defines the sort order of the index, if you want a particular column to be evaluated first, put it first in the key of the index.

Selectivity is a decent attribute to consider, but not the end all be all of index design.

Equality predicates preserve ordering of other key columns in the index, which may or may not become important depending on what your query needs to accomplish.

Post Where


After the where clause, there are some rather uncontroversial things that indexes can help with:

  • Joins
  • Grouping
  • Ordering

Of course, they help with this because indexes put data in order.

Having rows in a deterministic order makes the above things either much easier (joining and grouping), or free (ordering).

How we decide on key column order necessarily has to take each part of the query involved into account.

If a query is so complicated that creating one index to help it would mean a dozen key columns, you probably need to break things down further.

Minnow


When you’re trying to figure out a good index for one query, you usually want to start with the where clause.

Not always, but it makes sense in most cases because it’s where you can find gains in efficiency.

If your index doesn’t support your where clause, you’re gonna see an index scan and freak out and go in search of your local seppuku parlor.

After that, look to other parts of your query that could help you eliminate rows. Joins are an obvious choice, and typically make good candidates for index key columns.

At this point, your query might be in good enough shape, and you can leave other things alone.

If so, great! You can make the check out to cache. I mean cash.

Thanks for reading!

Reconsidering Missing Index Requests

Milk Carton


Part of reviewing any server necessarily includes reviewing indexes. When you’re working through things that matter, like unused indexes, duplicative indexes, heaps, etc. it’s pretty clear cut what you should do to fix them.

Missing indexes are a different animal though. You have three general metrics to consider with them:

  • Uses: the number of times a query could have used the index
  • Impact: how much the optimizer thinks it can reduce the cost of the query by
  • Query cost: How much the optimizer estimates the query will cost to run

Of those metrics, impact and query cost are entirely theoretical. I’ve written quite a bit about query costing and how it can be misleading. If you really wanna get into it, you can watch the whole series here.

In short: you might have very expensive queries that finish very quickly, and you might have very low cost queries that finish very slowly.

Especially in cases of parameter sniffing, a query plan with a very low cost might get compiled and generate a missing index request. What happens if every other execution of that query re-uses the cheaply-costed plan and runs for a very long time?

You might have a missing index request that looks insignificant.

Likewise, impact is how much the optimizer thinks it can reduce the cost of the current plan by. Often, you’ll create a new index and get a totally different plan. That plan may be more or less expensive that the previous plan. It’s all a duck hunt.

The most reliable of those three metrics is uses. I’m not saying it’s perfect, but there’s a bit less Urkeling there.

When you’re looking at missing index requests, don’t discount those with lots of uses for low cost queries. Often, they’re more important than they look.

Thanks for reading!