Starting SQL: Wrapping It All Up

IN THE BEGINNING


I started writing this series as a beginner’s guide to some of the oddities in SQL Server, and to address some of the incorrectness in the world when it comes to database practitioner-ing. It was enjoyable to write, but I think two months is probably enough.

Sometimes you have to clear a lot of material out before you can start on different stuff. Like burning dead brush or something. That’s my only advice about writing.

I hope you’ve enjoyed it, and you’ve learned some stuff along the way.

Anyway, for your convenience, here’s the full list.

Starting SQL: A Month Of Fundamentals Posts
Starting SQL: Written Queries vs Executed Queries
Starting SQL: Indexes Store Data In Order
Starting SQL: Measuring A Query
Starting SQL: Messing Up A Query
Starting SQL: Why Didn’t You Use My Index?
Starting SQL: A Little TOP Can Have A Big Sort
Starting SQL: Quick Notes About Heaps
Starting SQL: Picking Clustered Indexes
Starting SQL: The Ins Of Nonclustered Indexes
Starting SQL: The Outs Of Nonclustered Indexes
Starting SQL: Key-dependent Relationships
Starting SQL: Rethinking Key Column Order
Starting SQL: Sorts and Memory
Starting SQL: Index Wrap Up
Starting SQL: Waits And Measures
Starting SQL: Some Waits Relate
Starting SQL: Compensating For CPU Waits
Starting SQL: Compensating For Disk Waits
Starting SQL: Compensating For Memory Waits
Starting SQL: Compensating For Lock Waits
Starting SQL: What’s So Great About Query Plans, Anyway?
Starting SQL: Actual Plans Are Actually Important
Starting SQL: Things Query Plans Can’t Tell You
Starting SQL: Things Query Plans Might Tell You
Starting SQL: Memory Grants In Execution Plans
Starting SQL: Why Your Query Can’t Go Parallel, Table Variables Edition
Starting SQL: Why Your Query Can’t Go Parallel, Scalar Functions Edition
Starting SQL: Why Wouldn’t An Index Help?
Starting SQL: Missing Index Blind Spots
Starting SQL: Eager Index Spools
Starting SQL: I Don’t Care How Many Rows Are In Your Table Variable
Starting SQL: Things Your Execution Plan Will Warn You About
Starting SQL: Things Your Execution Plan Should Warn You About
Starting SQL: Query Plan Round Up
Starting SQL: What Are Parameters Made Of?
Starting SQL: What Happens When You Don’t Parameterize Queries?
Starting SQL: Persistent Problems With Parameters
Starting SQL: Parameter Sniffing Because Of Insufficient Indexes
Starting SQL: Fixing Parameter Sniffing
Starting SQL: Wrapping It All Up
Starting SQL: Why Not Make Everything Dynamic?
Starting SQL: Dynamic SQL And Temporary Objects
Starting SQL: Unparameterized Strings In Dynamic SQL
Starting SQL: A Little More Fun With Logging Dynamic SQL

Thanks for reading!

Starting SQL: A Little More Fun With Logging Dynamic SQL

Jammin Two


In case you missed it, because you probably missed it, a long time ago I wrote a post about logging dynamic SQL to a table to track execution history.

And while I still like that post, I wanted to give it another go with a little more chutzpah. It’s a little bit more complicated too, but hey.

To avoid doing version control in blog posts, I’ve got this script up on GitHub. If there are any issues, please raise them over there. Remember the last sentence about not doing version control in blog posts? Thanks for playing along.

I full admit this is mostly a bad idea, but it was a fun one to go on with.

Half The Battle


See, and see very very well. Getting the proc to put stuff in the table was easy, but… I also wanted the query plan.

Why? The plan cache is an unstable wreck on a lot of servers I look at, and a lot of people are still totally unaware of, or totally not on a version of SQL Server that supports Query Store. And besides, I enjoy forms of self-torture.

Of course, getting the query plan was an exercise in limitations. It’s uh. Well, you’ll see.

Here’s what I ended up having to do: I had to get the query plan inside the dynamic SQL.

DECLARE @sql NVARCHAR(MAX) = N'';
SET @sql += N'
SELECT COUNT_BIG(*) AS records
/*dbo.logging_test*/
FROM dbo.Badges AS b
JOIN dbo.Users AS u
    ON b.UserId = u.Id
WHERE u.Reputation > @i;

SELECT @query_plan = detqp.query_plan
FROM sys.dm_exec_requests AS der
CROSS APPLY sys.dm_exec_text_query_plan(der.plan_handle, 0, -1) AS detqp
WHERE der.session_id = @@SPID;
';

See, the dynamic SQL executes in a separate context, but the same SPID. If I try to do this outside of the dynamic SQL, I get the wrong plan. I get the plan for the outer proc, which doesn’t tell me anything. But now I have to pass the query plan out of the dynamic SQL, too.

EXEC sys.sp_executesql @sql, 
                       N'@i INT, @query_plan XML OUTPUT', 
                       @i, 
                       @query_plan = @query_plan OUTPUT;

And what really sucks? It also gives me back the plan for the DMV query to get the plan.

So we gotta modify XML. What a treat.

SET @query_plan.modify('
declare namespace p = "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
delete //p:StmtSimple[2]')

If you have more than one statement in there, you’d have to change the number in the bracket. Of course, if you’re executing more than one query inside the dynamic SQL, you’ve got worse ideas than me.

This Is Not A Great Post


So if you really wanna do this, you’ll have to call everything like so:

DECLARE @guid  UNIQUEIDENTIFIER;
DECLARE @query_plan XML;
EXEC dbo.logging @spid = @@SPID, @sql = @sql, @query_plan = NULL, @guid_in = NULL, @guid_out = @guid OUTPUT;

EXEC sys.sp_executesql @sql, 
                       N'@i INT, @query_plan XML OUTPUT', 
                       @i, 
                       @query_plan = @query_plan OUTPUT;

SET @query_plan.modify('
declare namespace p = "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
delete //p:StmtSimple[2]')

EXEC dbo.logging @spid = @@SPID, @sql = @sql, @query_plan = @query_plan, @guid_in = @guid, @guid_out = NULL;

Sure, I probably could have done something with SCOPE_IDENTITY() here, but why miss an excuse to use a GUID and alienate most right-thinking people?

End Result


Making sure it works:

EXEC dbo.logging_test @i = 0;
GO

SELECT * FROM dbo.logger AS l ORDER BY l.run_date;
GO
something something something

I can hear you asking yourself: why is this in a series of posts called “Starting SQL” when it seems like a bunch of advanced techniques?

It’s because I want to show you that not all good ideas have good implementations, and it can be genuinely difficult to get good information without a lot of changes and overhead. A lot of times, it’s just not worth it, and this is a great example of that.

A valuable lesson to anyone getting started with SQL Server: sometimes it’s just not worth it.

Thanks for reading!

A Word From Our Sponsors


Following on the GREAT SUCCESS of previous online events, I’m kicking things back off with another offering of my Premium Performance Tuning class, before I retire and remix the material.

To buy tickets, head over here. It’s going to be taking place Friday, October 23rd from 9am-5pm EST.

The normal ticket price is $300, but all September I’m running a 50% off sale with the coupon code “stillonline”.

Every purchase comes with access to my recorded video training for life, too.

Starting SQL: Unparameterized Strings In Dynamic SQL

Roaster


It doesn’t take much to let a bad person get at your data. I mean, the internet. Right? What a mistake.

Most of the time, you need to parameterize your code to avoid SQL injection. Hooray, we did it.

But there are some aspects to dynamic SQL where you can’t use parameters, like database, schema, table and column names. Worse, if you need to pass in or build a list of columns, you can’t possibly parameterize each one.

These strings are where the biggest potential for problems lies, though. They’ll often be declared are some long variable length to safeguard against string truncation, which leaves plenty of room to tack on whatever additional payload you want the query to execute.

Not Dropping, Dumping


When it comes to permissions, it’s somewhat easier to disallow an app login from dropping databases or tables. Of course, it’s a bit of a joke expecting that sort of granularity from most applications.

They’ll all wanna do something that requires that privilege occasionally, so will be granted the privilege perpetually.

What a nice time that is.

But let’s face it, dropping things is at best a temporary disruption. Stealing data is forever, especially if you can just keep stealing it. For example, this query can be used to get the results of what’s in sys.tables along with regular query results. It doesn’t take a rocket surgeon to figure out how things line up.

DECLARE @SQLString NVARCHAR(MAX) = N'';
DECLARE @Filter NVARCHAR(MAX) = N'';
DECLARE @nl NCHAR(2) = NCHAR(13);
DECLARE @Title NVARCHAR(250) =  N''' 
UNION ALL 
SELECT t.object_id, 
       t.schema_id, 
       t.name, 
       SCHEMA_NAME(t.schema_id),
       t.create_date,
       t.modify_date,
       NULL
FROM sys.tables AS t --';

SET @SQLString = @SQLString + 
N'
SELECT TOP (5000) 
    p.OwnerUserId,
    p.Score,
    p.Tags,
    p.Title, 
    p.CreationDate, 
    p.LastActivityDate, 
    p.Body
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656 ' + @nl;

IF @Title IS NOT NULL 
   BEGIN SET @Filter = @Filter + N'   AND p.Title LIKE ''' + N'%' + @Title + N'%''' + @nl; END;

IF @Filter IS NOT NULL
   BEGIN SET @SQLString += @Filter; END;

SET @SQLString += N'   ORDER BY p.Score DESC;';

PRINT @SQLString;
EXEC (@SQLString);

And of course, after sys.tables you have sys.columns, and once you know which columns are in which table you want to go after, the rest is soft cheese.

Zero Injection Policy


If we wanted to not have that happen, we could write our SQL like this instead:

DECLARE @SQLString NVARCHAR(MAX) = N'';
DECLARE @Filter NVARCHAR(MAX) = N'';
DECLARE @nl NCHAR(2) = NCHAR(13);
DECLARE @Title NVARCHAR(250) =  N''' 
UNION ALL 
SELECT t.object_id, 
       t.schema_id, 
       t.name, 
       SCHEMA_NAME(t.schema_id),
       t.create_date,
       t.modify_date,
       NULL
FROM sys.tables AS t --';

SET @SQLString = @SQLString + 
N'
SELECT TOP (5000) 
    p.OwnerUserId,
    p.Score,
    p.Tags,
    p.Title, 
    p.CreationDate, 
    p.LastActivityDate, 
    p.Body
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656 ' + @nl;

IF @Title IS NOT NULL 
   BEGIN SET @Filter = @Filter + N'   AND p.Title LIKE %@Title% ' + @nl; END;

IF @Filter IS NOT NULL
   BEGIN SET @SQLString += @Filter; END;

SET @SQLString += N'   ORDER BY p.Score DESC;';

PRINT @SQLString;
EXEC sys.sp_executesql @SQLString, 
                       N'@Title NVARCHAR(250)',
                       @Title;

The difference in output is quite apparent:

the broom snapped in two

But What About!


Yes, all those object names. Terrible about them, huh?

Of course, we have QUOTENAME to save us from those, and examples of it aren’t terribly interesting. This time, you get a link to the docs page.

Thanks for reading!

A Word From Our Sponsors


Following on the GREAT SUCCESS of previous online events, I’m kicking things back off with another offering of my Premium Performance Tuning class, before I retire and remix the material.

To buy tickets, head over here. It’s going to be taking place Friday, October 23rd from 9am-5pm EST.

The normal ticket price is $300, but all September I’m running a 50% off sale with the coupon code “stillonline”.

Every purchase comes with access to my recorded video training for life, too.

Starting SQL: Dynamic SQL And Temporary Objects

The Blink Of An Eye


Temporary objects are a necessity for just about every workload I’ve ever seen. One cannot trust the optimizer with overly large and complex queries.

At some point, you’ve gotta break things up, down, or sideways, in order to stabilize a result set for better cardinality estimation on one (or both) sides of the query.

But there are some tricks and oddities around how and where you can use temporary objects in dynamic SQL.

It’s important to consider scope, and object type, when dealing with both at once.

Since they’re my least favorite, let’s start with table variables.

Well, It Beats String Splitting


Under normal circumstances, you can’t pass table variables into dynamic SQL, nor can you declare a table variable outside of dynamic SQL and use it inside.

Trying to do either one of these things will result in an error!

DECLARE @crap TABLE(id INT);
DECLARE @sql NVARCHAR(MAX) = N'SELECT COUNT(*) AS records FROM @crap;'
EXEC sp_executesql @sql;
GO
DECLARE @crap TABLE(id INT);
DECLARE @sql NVARCHAR(MAX) = N'SELECT COUNT(*) AS records FROM @crap;'
EXEC sp_executesql @sql, N'@crap TABLE (id INT)', @crap;
GO

A big, stupid, milk-brained error. But you can do it with a User Defined Type:

CREATE TYPE crap AS TABLE(id INT);
GO 

DECLARE @crap AS crap;
DECLARE @sql NVARCHAR(MAX) = N'SELECT COUNT(*) AS records FROM @crap;'
EXEC sp_executesql @sql, N'@crap crap READONLY', @crap;
GO

In the same way that you can pass Table Valued Parameters into stored procedures, you can pass them into dynamic SQL, too. That’s pretty handy for various reasons.

But passing one out, no dice.

DECLARE @crap AS crap;
DECLARE @sql NVARCHAR(MAX) = N'DECLARE @crap AS crap;'
EXEC sp_executesql @sql, N'@crap crap OUTPUT', @crap = @crap OUTPUT;
GO 

There are cooler tricks you can do with dynamic SQL and table variables, though.

But of course, it might be even easier to use a temp table, so here we go.

I Don’t See Nothing Wrong


Of course, with temp tables, there is no problem using them with inner dynamic SQL

CREATE TABLE #gold(id INT);
DECLARE @sql NVARCHAR(MAX) = N'SELECT COUNT(*) AS records FROM #gold;'
EXEC sp_executesql @sql;
DROP TABLE #gold;
GO

But we don’t find nearly as much joy doing things in reverse.

DECLARE @sql NVARCHAR(MAX) = N'CREATE TABLE #gold(id INT);'
EXEC sp_executesql @sql;
SELECT COUNT(*) AS records FROM #gold;
DROP TABLE #gold;
GO

That’s why, rather than create a UDT, which gives you another dependency with not a lot of upside, people will just dump the contents of a TVP into a temp table, and use that inside dynamic SQL.

It’s a touch less clunky. Plus, with everything we know about table variables, it might not be such a great idea using them.

I’ve covered a way of creating temp tables dynamically before, so I won’t rehash it here, but that’s another neat trick you can do with temp tables.

Thanks for reading!

A Word From Our Sponsors


Following on the GREAT SUCCESS of previous online events, I’m kicking things back off with another offering of my Premium Performance Tuning class, before I retire and remix the material.

To buy tickets, head over here. It’s going to be taking place Friday, October 23rd from 9am-5pm EST.

The normal ticket price is $300, but all September I’m running a 50% off sale with the coupon code “stillonline”.

Every purchase comes with access to my recorded video training for life, too.

Starting SQL: Why Not Make Everything Dynamic?

Problem Solver


I think it was Doug Lane who coined the stages of dynamic SQL. One of them dealt with the fact that once you start using it, you just wanna use it everywhere, even places where you think you don’t need it.

Most of the time, you don’t. A parameter is good enough. But like we saw last week, sometimes parameters can backfire, and you can use dynamic SQL to save query performance.

That’s one great use, but it’s one you’re gonna have to use constantly. Most of the time when you’re using dynamic SQL, it’s not going to be to correct performance.

  • You want to build or use a different string based on some input
  • You have a statement you want to execute over multiple targets
  • Your script has to support multiple versions of SQL Server

Of course, one can’t reasonably write about dynamic SQL in SQL Server without linking to Erland. Boy howdy, did I learn most everything I know from him.

I Disconnect From You


One of the big downsides of dynamic SQL is that statements in a stored procedure are no longer associated with that stored procedure in system DMVs.

Of course, you can address this somewhat by adding a comment to the query inside the dynamic SQL:

/*headless*/
DECLARE @super_cool_sql NVARCHAR(MAX) = N'
SELECT * 
FROM ALL THE TABLES!
';

/*more headed*/
DECLARE @super_cool_sql NVARCHAR(MAX) = N'
SELECT * 
/*super cool procedure name*/
FROM ALL THE TABLES!
';

Where you put the comment is irrelevant, but if it’s a particularly long query, I’d probably want it close to or above the select so it doesn’t get truncated.

But we’re all nice people who don’t write queries with more than 65k characters and spaces in them.

Right? Right.

While it’s nice to know where they came from, they’re not easy to track down because they don’t have a parent object id — they’re rogue agents in the plan cache.

It can also make troubleshooting a stored procedure difficult because it can be a little harder to see which statements did the most work.

  • You might be conditionally executing certain blocks of dynamic SQL for different reasons
  • You might be building dynamic where clauses that have different performance profiles

Other Thans


The first thing I want to cover outright is that IF branching without dynamic SQL does not work. Flat out.

Yes, you can control the flow of logic, but it has no desirable impact on query plan compilation. Everything gets compiled the first time.

Instead of dynamic SQL, though, you could use separate stored procedures, which at least makes the objects a little easier to track down in the plan cache or Query Store.

CREATE OR ALTER PROCEDURE dbo.VoteCount (@VoteTypeId INT, @YearsBack INT)
AS
BEGIN

IF @VoteTypeId IN (2, 1, 3, 5, 10, 6)
BEGIN
    EXEC dbo.VoteCount_InnerBig @VoteTypeId, @YearsBack;
END;

IF @VoteTypeId IN (16, 15, 11, 8, 9, 7, 12, 4)
BEGIN
    EXEC dbo.VoteCount_InnerSmall @VoteTypeId, @YearsBack;
END;

END;

Both of those stored procedures can have the same statement in them, without the ? = (SELECT ?) addition needed with the dynamic SQL option.

That they’re owned by different objects is enough to get them separate optimization paths. You’re also a bit less prone to permissions issues, if you’re the kind of person who takes those seriously. But if your app logs in as db_owner or whatever, well, BOFL with that.

Speaking of which, even though I find security incredibly dull and frustrating, let’s talk a little about how people can take advantage of bad dynamic SQL.

Thanks for reading!

A Word From Our Sponsors


Following on the GREAT SUCCESS of previous online events, I’m kicking things back off with another offering of my Premium Performance Tuning class, before I retire and remix the material.

To buy tickets, head over here. It’s going to be taking place Friday, October 23rd from 9am-5pm EST.

The normal ticket price is $300, but all September I’m running a 50% off sale with the coupon code “stillonline”.

Every purchase comes with access to my recorded video training for life, too.

Starting SQL: Fixing Parameter Sniffing

Mostoftentimes


There is a good solution to alleviating parameter sniffing. Ones I commonly use are:

  • Temp tables
  • Changing indexes
  • Dynamic SQL

There isn’t really a good place to put a temp table in this stored proc, though I did blog about a similar technique before.

It would certainly be a good candidate for index changes though, because the first thing we need to address is that key lookup.

It’s a sensitive issue.

King Index


We’re going to walk through something I talked about what seems like an eternity ago. Why? Because it has practical application here.

When you look at the core part of the query, PostId is only in the select list. Most advice around key lookups (including, generally, my own) is to consider putting columns only in the output into the includes of the index.

and that’s where this is

But we’re in a slightly different situation, here.

SELECT v.VoteTypeId,
       v.PostId,
       COUNT_BIG(v.PostId) AS TotalPosts,
       COUNT_BIG(DISTINCT v.PostId) AS UniquePosts
FROM dbo.Votes AS v
WHERE v.CreationDate >= DATEADD(YEAR, (-1 * @YearsBack), '2014-01-01')
AND   v.VoteTypeId = @VoteTypeId
GROUP BY v.VoteTypeId,
         v.PostId

We’re getting a distinct count, and SQL Server has some choices for coming up with that.

If we follow the general advice here and create this index, we’ll end up in trouble:

CREATE INDEX v 
    ON dbo.Votes(VoteTypeId, CreationDate) INCLUDE(PostId)
    WITH (DROP_EXISTING = ON);
growling intensifies

Since the Stream Aggregate expects ordered data, and PostId isn’t in order in the index (because includes aren’t in any particular order), we need to sort it. For a small amount of data, that’s fine. For a large amount of data, it’s not.

There is a second Sort in the plan further down, but it’s on the count expression, which means we can’t index it without adding in additional objects, like an indexed view.

SELECT TOP (1000) 
    x.VoteTypeId,
    x.PostId,
    x.TotalPosts,
    x.UniquePosts
FROM 
    (
        SELECT v.VoteTypeId,
               v.PostId,
               COUNT_BIG(v.PostId) AS TotalPosts, -- this is the expression
               COUNT_BIG(DISTINCT v.PostId) AS UniquePosts
        FROM dbo.Votes AS v 
        WHERE v.CreationDate >= DATEADD(YEAR, (-1 * @YearsBack), '2014-01-01')
        AND   v.VoteTypeId = @VoteTypeId
        GROUP BY v.VoteTypeId,
                 v.PostId
    ) AS x
ORDER BY x.TotalPosts DESC; -- this is the ordering

What’s An Index To A Non-Believer?


A better index in this case looks like this:

CREATE INDEX v 
    ON dbo.Votes(VoteTypeId, PostId, CreationDate)
    WITH (DROP_EXISTING = ON);

It will shave about 6 seconds off the run time, but there’s still a problem when the “big” data doesn’t go parallel:

big data differences

When the plan goes parallel, it’s about 4x faster than the serial version. Now I know what you’re thinking, here. We could use OPTIMIZE FOR to always get the plan for the big value. And that’s not a horrible idea — the small data parameter runs very quickly re-using the parallel plan here — but there’s another way.

Let’s look at our data.

Don’t Just Stare At It


Let’s draw an arbitrary line. I think a million is a popular number. I wish it was a popular number in my bank account, but you know.

unwritten law

I know we’re ignoring the date column data, but this is good enough for now. There’s only so much I can squeeze into one blog post.

The point here is that we’re going to say that anything under a million rows is okay with using the small plan, and anything over a million rows needs the big plan.

Sure, we might need to refine that later if there are outliers within those two groups, but this is a blog post.

How do we do that? We go dynamic.

Behike 54


Plan ol’ IF branches plan ol’ don’t work. We need something to get two distinct plans that are re-usable.

Here’s the full procedure:

CREATE OR ALTER PROCEDURE dbo.VoteCount (@VoteTypeId INT, @YearsBack INT)
AS
BEGIN

DECLARE @sql NVARCHAR(MAX) = N'';

SET @sql += N'
SELECT TOP (1000) 
    x.VoteTypeId,
    x.PostId,
    x.TotalPosts,
    x.UniquePosts
/*dbo.VoteCount*/
FROM 
    (
        SELECT v.VoteTypeId,
               v.PostId,
               COUNT_BIG(v.PostId) AS TotalPosts,
               COUNT_BIG(DISTINCT v.PostId) AS UniquePosts
        FROM dbo.Votes AS v 
        WHERE v.CreationDate >= DATEADD(YEAR, (-1 * @YearsBack), ''2014-01-01'')
        AND   v.VoteTypeId = @VoteTypeId '

IF @VoteTypeId IN (2, 1, 3, 5, 10, 6)
BEGIN
    SET @sql += N'
        AND 1 = (SELECT 1)'
END

IF @VoteTypeId IN (16, 15, 11, 8, 9, 7, 12, 4)
BEGIN
    SET @sql += N'
        AND 2 = (SELECT 2)'
END

SET @sql += N'
        GROUP BY v.VoteTypeId,
                 v.PostId
    ) AS x
ORDER BY x.TotalPosts DESC;
';

RAISERROR('%s', 0, 1, @sql) WITH NOWAIT;

EXEC sys.sp_executesql @sql, 
                       N'@VoteTypeId INT, @YearsBack INT', 
                       @VoteTypeId, @YearsBack;

END;

There’s a bit going on in there, but the important part is in the middle. This is what will give use different execution plans.

IF @VoteTypeId IN (2, 1, 3, 5, 10, 6)
BEGIN
    SET @sql += N'
        AND 1 = (SELECT 1)'
END

IF @VoteTypeId IN (16, 15, 11, 8, 9, 7, 12, 4)
BEGIN
    SET @sql += N'
        AND 2 = (SELECT 2)'
END

Sure, there are other ways to do this. You could even selectively recompile if you wanted to. But some people complain when you recompile. It’s cheating.

Because the SQL Server Query Optimizer typically selects the best execution plan for a query, we recommend only using hints as a last resort for experienced developers and database administrators.

See? It’s even documented.

Now that we’ve got that all worked out, we can run the proc and get the right plan depending on the amount of data we need to shuffle around.

strangers in the night

Little Star


Now I know what you’re thinking. You wanna know more about that dynamic SQL. You want to solve performance problems and have happy endings.

We’ll do that next week, where I’ll talk about common issues, best practices, and more tricks you can use to get queries to perform better with it.

Thanks for reading!

A Word From Our Sponsors


Following on the GREAT SUCCESS of previous online events, I’m kicking things back off with another offering of my Premium Performance Tuning class, before I retire and remix the material.

To buy tickets, head over here. It’s going to be taking place Friday, October 23rd from 9am-5pm EST.

The normal ticket price is $300, but all September I’m running a 50% off sale with the coupon code “stillonline”.

Every purchase comes with access to my recorded video training for life, too.

Starting SQL: Parameter Sniffing Because Of Insufficient Indexes

Down And Out


There are lots of different ways that parameter sniffing can manifest in both the operators chosen, the order of operators chosen, and the resources acquired by a query when a plan is compiled. At least in my day-to-day consulting, one of the most common reasons for plans being disagreeable is around insufficient indexes.

One way to fix the issue is to fix the index. We’ll talk about a way to do it without touching the indexes tomorrow.

Let’s say we have this index to start with. Maybe it was good for another query, and no one ever thought twice about it. After all, you rebuild your indexes every night, what other attention could they possible need?

CREATE INDEX v 
    ON dbo.Votes(VoteTypeId, CreationDate);

If we had a query with a where clause on those two columns, it’d be be able to find data pretty efficiently.

But how much data will it find? How many of each VoteTypeId are there? What range of dates are we looking for?

Well, that depends on our parameters.

Cookie Cookie


Here’s our stored procedure. There’s one column in it that isn’t in our index. What a bummer.

CREATE OR ALTER PROCEDURE dbo.VoteCount (@VoteTypeId INT, @YearsBack INT)
AS
BEGIN

SELECT TOP (1000) 
    x.VoteTypeId,
    x.PostId,
    x.TotalPosts,
    x.UniquePosts
FROM 
    (
        SELECT v.VoteTypeId,
               v.PostId,
               COUNT_BIG(v.PostId) AS TotalPosts,
               COUNT_BIG(DISTINCT v.PostId) AS UniquePosts
        FROM dbo.Votes AS v
        WHERE v.CreationDate >= DATEADD(YEAR, (-1 * @YearsBack), '2014-01-01')
        AND   v.VoteTypeId = @VoteTypeId
        GROUP BY v.VoteTypeId,
                 v.PostId
    ) AS x
ORDER BY x.TotalPosts DESC;

END;

That doesn’t matter for a small amount of data, whether it’s encountered because of the parameters used, or the size of the data the procedure is developed and tested against. Testing against unrealistic data is a recipe for disaster, of course.

Cookie Cookie


What can be tricky is that if the sniffing is occurring with the lookup plan, the optimizer won’t think enough of it to request a covering index, either in plan or in the index DMVs. It’s something you’ll have to figure out on your own.

i said me too
oh yeah that

So we need to add that to the index, but where? That’s an interesting question, and we’ll answer it in tomorrow’s post.

Thanks for reading!

A Word From Our Sponsors


Following on the GREAT SUCCESS of previous online events, I’m kicking things back off with another offering of my Premium Performance Tuning class, before I retire and remix the material.

To buy tickets, head over here. It’s going to be taking place Friday, October 23rd from 9am-5pm EST.

The normal ticket price is $300, but all September I’m running a 50% off sale with the coupon code “stillonline”.

Every purchase comes with access to my recorded video training for life, too

Starting SQL: Persistent Problems With Parameters

Pick A Database, Any Database


Alright, maybe not any database. Let’s stick with SQL Server. That’s the devil we know.

At some point in your life, you’re going to construct a query that takes user input, and that input is likely going to come in the form of a parameter.

It could be a stored procedure, dynamic SQL, or something from your application. But there it is.

Waiting. Watching.

Sniffing.

Defining A Problem


When we use parameters, we re-use execution plans, at least until a Qualifying Event™ occurs.

What’s a qualifying event?

  • Recompile hint
  • Stats update
  • Temp table modification threshold
  • Plan eviction
  • Server restart

Now, it might be reasonable to think that a mature optimizer — and it is an optimizer, not just a planner — would be able to do something a bit more optimal. After all, why would anyone think it would take the same amount of work to get through 100 rows as it would take to get through 1,000,000 rows? It’s a fundamentally different approach.

Doing a run to the grocery store to replenish a few things requires a far different mindset from going to prepare for a large family meal. You have to choose between a basket or a cart, whether you can jump right to the couple spots you need or you need to walk up and down every aisle, and even if you might need to write down a list because it doesn’t fit into brain memory.

One might also have the expectation that if a significant inaccuracy is detected at runtime, the strategy might change. While that does sort of happen with Adaptive Joins, it’s not a full plan rewrite.

Detecting A Problem


The plan cache usually sucks for this, unless you’re saving the data off to more stable tables. Why? Because most people only figure out they’ve been sniffed after a plan changes, which means it’s not in the cache anymore. You know, when end users start complaining, the app goes unresponsive, you can’t connect to the server, etc.

You could set your watch to it.

But sometimes it’s there. Some funny looking little plan that looks quite innocent, but seems to do a lot of work when you bang it up against other DMVs.

If you have the luxury, Query Store is quite a better tool for detecting plan changes. It’s even got reports built in just for that.

how nice of you.

For the extra fancy amongst you, I pray that your expensive monitoring tool has a way to tell you when query plans change, or when normally fast plans deviate from that.

Deciphering A Problem


This is where things can get difficult, unless you’re monitoring or logging information. You typically need a few different combinations of parameter values to feed in to your query, so you can see what changed and when. Quite often, there’s no going back easily.

Let’s say you had a plan, and it was a good plan. Then one of those pesky qualifying events comes along, and it’s decided that you need a new plan.

And what if… that new plan is worse? No matter how much you recompile or update stats or toggle with cardinality estimation, you just can’t go back to the way things were without lots of hints or changes to the query? Maybe that’s not parameter sniffing. Maybe that’s parameter snuffing. I’m gonna copyright that.

Most parameter sniffing will result in a plan with a set of bad choices for different amounts of data, which will result in something like this:

not crafty

This isn’t a “bad estimate” — it was a pretty good estimate for the first parameter value. It just wasn’t a good estimate for the second parameter value.

And to be honest, quite a bit of parameter sniffing issues come from Nested Loops. Not because it’s bad, but because it’s bad for large amount of data, especially in a serial plan. It’s a pretty easy way to gum up a query, though. Make it get stuck in a loop for 13 million rows. It wasn’t fast? No kidding. Poof, be gone.

But then opposite-land isn’t good, either.

like falling, baby

This plan probably makes plenty of sense for a big chunk of data. One big scan, one big hash, one big sort. Done.

Of course, for a small amount of data, we go from taking 1ms to taking 2s. Small amount of data people will likely not be happy with that. Your server might not be either, what with all the extra CPU resources we’re using in this here parallel plan all the time now.

Tomorrow, we’ll look at how sometimes you can fix parameter sniffing with better indexes.

Thanks for reading!

A Word From Our Sponsors


Following on the GREAT SUCCESS of previous online events, I’m kicking things back off with another offering of my Premium Performance Tuning class, before I retire and remix the material.

To buy tickets, head over here. It’s going to be taking place Friday, October 23rd from 9am-5pm EST.

The normal ticket price is $300, but all September I’m running a 50% off sale with the coupon code “stillonline”.

Every purchase comes with access to my recorded video training for life, too.

Starting SQL: What Happens When You Don’t Parameterize Queries?

Nothing, Nothing, Nothing


Aside from the obvious perils of SQL Injection, parameterizing queries can be helpful in other ways.

Not to downplay SQL Injection at all, it’s just that fixing code is only one part of the equation. Vendor applications often need to do some pretty administrative things, and require elevated permissions.

Just last week I was looking at an application that had a stored procedure which took a parameter that was the name of an executable. There were no checks on the name. You could put in quite not-figuratively anything you wanted, and xp_cmdshell would run it.

All the parameterization in the world won’t help that.

Well, you get what you get.

Of Parameters And Plans


This can happen in three common ways:

  • You have an interface that accepts user-entered values
  • You have an ORM where you haven’t explicitly defined parameter types and precisions
  • You have dynamic SQL where values are concatenated into strings

Since I spend all my time in the database, I’m going to show you the third one. I wouldn’t normally format a query like this, but I’m trying to keep it web-friendly.

DECLARE @StartDate DATETIME = '20130101';
WHILE @StartDate < '20140101'
BEGIN
DECLARE @NoParams4u NVARCHAR(MAX) = 
N'
SELECT SUM(c.Score) AS TotalScore
FROM dbo.Comments AS c
WHERE c.CreationDate 
    BETWEEN CONVERT(DATETIME, ''' + RTRIM(@StartDate) + ''') 
    AND CONVERT(DATETIME, ''' + RTRIM(DATEADD(DAY, 11, @StartDate)) + ''')
AND 1 = (SELECT 1);
';

EXEC sys.sp_executesql @NoParams4u;
RAISERROR('%s', 0, 1, @NoParams4u);

SET @StartDate = DATEADD(DAY, 11, @StartDate);

END
GO

To make things interesting, I’ve created a non-covering index on the Comments table:

CREATE INDEX c ON dbo.Comments(CreationDate);

I’ve also had to use an oddly specific number of day increment in order to get some good plan variety because of that index. If you ever wonder why some blog posts take three hours to write, the most likely cause is finding the right number.

The first thing you should notice is that even using sp_executesql doesn’t help when our code isn’t parameterized. The second thing you’ll probably notice is 1 = (SELECT 1) at the end of the query. Past me has answers to all those questions, if you click on the links.

Bouquet


If I run that loop, I get back 34 results. Each query gets an individually compiled query plan, though there are only three “different” plans used.

eyeful

Notice that each plan has different literal date values passed in to it, and different estimates. If we created a covering index, we’d get plan stability across executions, but then we’d still have to compile it when these literal values get passed in. I did it like this to reinforce my point.

We can validate that by looking in the plan cache and surrounding DMVs, using sp_BlitzCache.

Planimal Activist


If you have code like this, one good way to find culprits is by running it like this:

EXEC sp_BlitzCache @SortOrder = 'query hash';

A zoomed-in snapshot of the part of the results that prove my point looks about like this:

aye yi yi

This is just the top 10 results. You can see a warning about multiple plans, and that each query has a single execution.

I mentioned before that if we add a covering index, we’ll get plan stability. That’s true, but specific to this demo on SQL Server 2019, there’s an alternate plan available for executions that qualify for Batch Mode On Rowstore (BMOR):

i’m in it

In prior versions of SQL Server in row mode only plans, we could only have a stream aggregate. But even getting the same plan most of the time, we still need to compile it every time. SQL Server still thinks each of these queries is “new” and needs to get a plan compiled.

Fixing It


There are two options for fixing this. If you need to do it quickly, at scale, the database level option FORCED PARAMETERIZATION can take care of most of these problems. Just make sure you read up on the limitations. To fix it for just a couple problem queries, you need to fix the dynamic SQL.

DECLARE @NoParams4u NVARCHAR(MAX) = 
N'
SELECT SUM(c.Score) AS TotalScore
FROM dbo.Comments AS c
WHERE c.CreationDate BETWEEN @StartDate AND DATEADD(DAY, 11, @StartDate)
AND 1 = (SELECT 1);
';

EXEC sys.sp_executesql @NoParams4u, N'@StartDate DATETIME', @StartDate;

I’m not concatenating values into the string anymore, and I’m passing the @StartDate value in when I execute the dynamic SQL.

One point I want to make is that it’s generally safe to do date math on the parameter. I’m not doing date math on the column, which would generally be a bad idea.

But anyway, now our plan gets used 34 times.

damn family

In this case, plan reuse works out well. Every query is looking at a sufficiently narrow range of data to have it not matter, and the longest running execution is around 31ms.

But what about when that doesn’t work out? When can parameterization backfire? We’ll find out tomorrow!

Thanks for reading!

A Word From Our Sponsors


Following on the GREAT SUCCESS of previous online events, I’m kicking things back off with another offering of my Premium Performance Tuning class, before I retire and remix the material.

To buy tickets, head over here. It’s going to be taking place Friday, October 23rd from 9am-5pm EST.

The normal ticket price is $300, but all September I’m running a 50% off sale with the coupon code “stillonline”.

Every purchase comes with access to my recorded video training for life, too.

A Word From Our Sponsors


Following on the GREAT SUCCESS of previous online events, I’m kicking things back off with another offering of my Premium Performance Tuning class, before I retire and remix the material.

To buy tickets, head over here. It’s going to be taking place Friday, October 23rd from 9am-5pm EST.

The normal ticket price is $300, but all September I’m running a 50% off sale with the coupon code “stillonline”.

Every purchase comes with access to my recorded video training for life, too.

Starting SQL: What Are Parameters Made Of?

Parameter Positive


There are many good reasons to parameterize a query, but there are also trade-offs. There’s no such thing as a free parameter, as they say.

In this post, we’re going to discuss what is and isn’t a parameter, and some of the pros and cons.

What’s important to keep in mind is that good indexing can help avoid many of the cons, but not all. Bad indexing, of course, causes endless problems.

There are many good reasons to parameterize your queries, too. Avoiding SQL injection is a very good reason.

But then!

What’s Not A Parameter


It can be confusing to people who are getting started with SQL Server, because parameters and variables look exactly the same.

They both start with @, and feel pretty interchangeable. They behave the same in many ways, too, except when it comes to cardinality estimation.

To generalize a bit, though, something is a parameter if it belongs to an object. An object can be an instance of:

  • A stored procedure
  • A function
  • Dynamic SQL

Things that aren’t parameters are things that come into existence when you DECLARE them. Of course, you can pass things you declare to one of the objects above as parameters. For example, there’s a very big difference between these two blocks of code:

DECLARE @VoteTypeId INT = 7;

SELECT *
FROM dbo.Votes AS v
WHERE v.VoteTypeId = @VoteTypeId;

DECLARE @sql NVARCHAR(MAX) = N'
SELECT *
FROM dbo.Votes AS v
WHERE v.VoteTypeId = @VoteTypeId;
'
EXEC sp_executesql @sql, N'@VoteTypeId INT', @VoteTypeId;

But it’s not obvious until you look at the query plans, where the guess for the declared variable is god awful.

Then again, if you read the post I linked to up there, you already knew that. Nice how that works.

If you’re too lazy to click, I’m too lazy to repeat myself.

thanks

What’s the point? Variables, things you declare, are treated differently from parameters, things that belong to a stored procedure, function, or dynamic SQL.

Parameter Problems


The problem with parameterization is one of familiarity. It not only breeds contempt, but… sometimes data just grows apart.

Really far apart.

SELECT 
    v.VoteTypeId, 
    COUNT_BIG(*) AS records
FROM dbo.Votes AS v
GROUP BY v.VoteTypeId
ORDER BY records;
pattern forming

Natural Selection


When you parameterize queries, you give SQL Server permission to remember, and more importantly, to re-use.

What it re-uses is the execution plan, and what it remembers are cardinality estimates. If we do something like this, we don’t get two different execution plans, or even two different sets of guesses, even though the values that we’re feeding to each query have quite different distributions in our data.

The result is two query plans that look quite alike, but behave quite differently.

wist

One takes 23 milliseconds. The other takes 1.5 seconds. Would anyone complain about this in real life?

Probably not, but it helps to illustrate the issue.

Leading Miss


Where this can get confusing is when you’re trying to diagnose a performance problem.

If you look in the plan cache, or in query store, you’ll see the plan that gets cached for the very first parameter. It’ll look simple and innocent, sure. But the problem is with a totally different parameter that isn’t logged anywhere.

You might also face a different problem, where the query recompiles because you restarted the server, updated stats, rebuilt indexes, or enough rows in the table changed to trigger an automatic stats update. If any of those things happen, the optimizer will wanna come up with a new plan based on whatever value goes in first.

If the roles get reversed, the plan will change, but they’ll both take the same amount of time now.

DECLARE @VoteTypeId INT;

SET @VoteTypeId = 16

DECLARE @sql NVARCHAR(MAX) = N'
SELECT *
FROM dbo.Votes AS v
WHERE v.VoteTypeId = @VoteTypeId;
';
EXEC sp_executesql @sql, N'@VoteTypeId INT', @VoteTypeId;

SET @VoteTypeId = 7;

SET @sql = N'
SELECT *
FROM dbo.Votes AS v
WHERE v.VoteTypeId = @VoteTypeId;
';
EXEC sp_executesql @sql, N'@VoteTypeId INT', @VoteTypeId;
totin’

Deal With It 😎


In the next few posts, we’ll talk about what happens when you don’t parameterize queries, and different ways to deal with parameter sniffing.

  • A recompile hint can help, it might not always be appropriate depending on execution frequency and plan complexity
  • Optimize for unknown hints will give you the bad variable guess we saw at the very beginning of this post

We’re going to need more clever and current ways to fix the issue. If you’re stuck on those things recompiling or unknown-ing, you’re stuck not only on bad ideas, but outdated bad ideas.

Like duck l’orange and Canadian whiskey.

Thanks for reading!

A Word From Our Sponsors


Following on the GREAT SUCCESS of previous online events, I’m kicking things back off with another offering of my Premium Performance Tuning class, before I retire and remix the material.

To buy tickets, head over here. It’s going to be taking place Friday, October 23rd from 9am-5pm EST.

The normal ticket price is $300, but all September I’m running a 50% off sale with the coupon code “stillonline”.

Every purchase comes with access to my recorded video training for life, too.