Parameterization – Page 2

Indexed Views As Filtered Indexes In SQL Server

Posted on January 19, 2021August 4, 2022 by Erik Darling

Pssst!

If you landed here from Brent’s weekly links, use this link to get my training for 90% off.

The access is for life, but this coupon code isn’t! Get it while it lasts.

Discount applies at checkout, and you have to purchase everything for it to apply.

A Persistent Frustration

SQL Server comes with some great features for tuning queries:

Computed Columns
Filtered Indexes
Indexed Views

But there’s an interoperability issue when you try to use things together. You can’t create a filtered index with the filter definition on a computed column, nor can you create a filtered index on an indexed view.

If you find yourself backed into a corner, you may need to consider using an indexed view without any aggregation (which is the normal use-case).

Empty Tables

If we try to do something like this, we’ll get an error.

DROP TABLE IF EXISTS dbo.indexed_view;
GO

CREATE TABLE dbo.indexed_view
(
    id int PRIMARY KEY,
    notfizzbuzz AS (id * 2)
);
GO

CREATE INDEX n 
    ON dbo.indexed_view (notfizzbuzz) 
WHERE notfizzbuzz = 0;
GO

Yes, I’m putting the error message here for SEO bucks.

Msg 10609, Level 16, State 1, Line 19
Filtered index 'nfb' cannot be created on table 'dbo.indexed_view' because the column 'notfizzbuzz' in the filter expression is a computed column. 
Rewrite the filter expression so that it does not include this column.

An Indexed View Doesn’t Help

If we run this to create an indexed view on top of our base table, we still can’t create a filtered index, but there’s a different error message.

CREATE OR ALTER VIEW dbo.computed_column
WITH SCHEMABINDING
AS
SELECT
    iv.id, 
    iv.notfizzbuzz
FROM dbo.indexed_view AS iv;
GO 

CREATE UNIQUE CLUSTERED INDEX c 
    ON dbo.computed_column(id);

CREATE INDEX nfb 
    ON dbo.computed_column(notfizzbuzz) 
WHERE notfizzbuzz = 0;

Msg 10610, Level 16, State 1, Line 37
Filtered index 'nfb' cannot be created on object 'dbo.computed_column' because it is not a user table. 
Filtered indexes are only supported on tables. 
If you are trying to create a filtered index on a view, consider creating an indexed view with the filter expression incorporated in the view definition.

But what a thoughtful error message it is! Thanks, whomever wrote that.

Still Needs Help

We can create this indexed view just fine.

CREATE OR ALTER VIEW dbo.computed_column
WITH SCHEMABINDING
AS
SELECT
    iv.id, 
    iv.notfizzbuzz
FROM dbo.indexed_view AS iv
WHERE iv.notfizzbuzz = 0;
GO 

CREATE UNIQUE CLUSTERED INDEX c 
    ON dbo.computed_column(id);

But if we try to select from it, the view is expanded.

SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0;

The issue here is the simple parameterization that is attempted with the trivial plan.

If we run the query like this, and look at the end of the output, we’ll see a message at the bottom that our query is safe for auto (simple) parameterization. This may still happen even if the plan doesn’t remain trivial (more detail at the link above!)

DBCC FREEPROCCACHE;
GO 
DBCC TRACEON(8607, 3604);
GO 
SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0;
DBCC TRACEOFF(8607, 3604);
GO 

********************
** Query marked as Cachable
** Query marked as Safe for Auto-Param
********************

Making It Work

The two ways we can run this query to get the indexed view to be used are like so:

SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc WITH(NOEXPAND)
WHERE cc.notfizzbuzz = 0;


SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0
AND   1 = (SELECT 1);

A Closer Look

If we put those two queries through the ringer, we’ll still see auto (simple) parameterization from the first query:

DBCC FREEPROCCACHE;
GO 
DBCC TRACEON(8607, 3604);
GO 
SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc WITH(NOEXPAND)
WHERE cc.notfizzbuzz = 0;
GO 
DBCC TRACEOFF(8607, 3604);
GO 

********************
** Query marked as Cachable
** Query marked as Safe for Auto-Param
********************

DBCC FREEPROCCACHE;
GO 
DBCC TRACEON(8607, 3604);
GO 
SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0
AND   1 = (SELECT 1);
GO 
DBCC TRACEOFF(8607, 3604);
GO 

********************
** Query marked as Cachable
********************

It’s goofy, but it’s worth noting. Anyway, if I had to pick one of these methods to get the plan I want, it would be the NOEXPAND version.

Using that hint is the only thing that will allow for statistics to get generated on indexed views.

In case you’re wondering, marking the computed column as PERSISTED doesn’t change the outcome for any of these issues.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Considerations For Implementing Soft Deletes In SQL Server

Posted on January 12, 2021May 16, 2022 by Erik Darling

From The Beginning

Implementing soft deletes for an app that’s been around for a while can be tough. In the same way as implementing Partitioning can be tough to add in later to get data management value from (rebuilding clustered indexes on the scheme, making sure all nonclustered indexes are aligned, and all future indexes are too, and making sure you have sufficient partitions at the beginning and end for data movement).

Boy, I really stressed those parentheses out.

If you do either one from the outset, it’s far less painful to manage. The structural stuff is there for you from the beginning, and you can test different strategies early on before data change become difficult to manage.

Queries

The first and most obvious thing is that all your queries now need to only find data that isn’t deleted.

Almost universally, it’s easier to put views on top of tables that have the appropriate bit search for deleted or not deleted rows than to expect people to remember it.

CREATE VIEW dbo.Users_Active
AS
SELECT 
    u.*
FROM dbo.Users AS u
WHERE u.is_deleted = 0;

CREATE VIEW dbo.Users_Inactive
AS
SELECT 
    u.*
FROM dbo.Users AS u
WHERE u.is_deleted = 1;

It’s not that views have any magical performance properties; they’re just queries after all, but it gives you an explicit data source.

Indexes

Depending on how your other queries search for data, you may need to start accounting for the is_deleted flag in your indexes. This could make a really big difference if the optimizer stops choosing your narrower nonclustered indexes because it hates key lookups.

Typically, other predicates will give you a selective-enough result set that a residual predicate on a bit field won’t make much difference. If you’ve already got a seek to the portion of data you’re interested in and most of it will be not-deleted, who cares?

And let’s be honest, in most implementations deleted rows will be the minority of data, and searches for it will be far less common. Usually it’s just there for an occasional audit.

In adjacent cases where instead of deleted you need to designate things as currently active, and you may have many inactive rows compared to active rows, filtered indexes can be your best friend.

Coming back to the views, I don’t think that making them indexed is necessary by default, but it might be if you’re using forced parameterization and filtered indexes.

CREATE TABLE dbo.Users(id int, is_deleted bit);
GO 

CREATE INDEX u ON dbo.Users (id) WHERE is_deleted = 0;
GO

SELECT 
    u.id, u.is_deleted
FROM dbo.Users AS u
WHERE u.is_deleted = 0;

Under simple parameterization, this can be fine. Under forced parameterization, things can get weird.

Tables and Tables

In some cases, it might be easier to create tables specifically for deleted rows so you don’t have unnecessary data in your main tables. You can implement this easily enough with after triggers. Just make sure they’re designed to handle multiple rows.

If you want something out of the box, you might mess with:

Temporal tables
Change Data Capture
Change Tracking

However, none of those help you deal with who deleted rows. For that, you’ll need an Audit.

Thanks for reading!

Going Further

Defeating Parameter Sniffing With Dynamic SQL In SQL Server

Posted on December 24, 2020May 15, 2022 by Erik Darling

Enjoy!

Thanks for watching!

Going Further

A Suggestion To Make Key Lookups Less Of A Performance Problem In SQL Server

Posted on December 18, 2020May 15, 2022 by Erik Darling

Odor Of Gas

One problem with Lookups, aside from the usual complaints, is that the optimizer has no options for when the lookup happens.

If the optimizer decides to use a nonclustered index to satisfy some part of the query, but the nonclustered index doesn’t have all of the columns needed to cover what the query is asking for, it has to do a lookup.

Whether the lookup is Key or RID depends on if the table has a clustered index, but that’s not entirely the point.

The point is that there’s no way for the optimizer to decide to defer the lookup until later in the plan, when it might be more opportune.

Gastric Acid

Let’s take one index, and two queries.

CREATE INDEX p
    ON dbo.Posts(PostTypeId, Score, CreationDate)
    INCLUDE(OwnerUserId);

Stop being gross.

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 6
ORDER BY p.CreationDate DESC;

The main point here is not that the lookup is bad; it’s actually good, and I wish both queries would use one.

If we hint the first query to use the nonclustered index, things turn out better.

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p WITH(INDEX = p)
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;

Running a full second faster seems like a good thing to me, but there’s a problem.

Ingest

Whether we use the lookup or scan the clustered index, all of these queries ask for rather large memory grants, between 5.5 and 6.5 GB

SQL Server Query Plan Tool Tips — bigsort4u

The operator asking for memory is the Sort — and while I’d love it if we could index for every sort — it’s just not practical.

So like obviously changing optimizer behavior is way more practical. Ahem.

The reason that the Sort asks for so much memory in each of these cases is that it’s forced to order the entire select output from the Posts table by the CreationDate column.

Detach

If we rewrite the query a bit, we can get the optimizer to sort data long before we go get all the output columns:

SELECT TOP (1000)
    u.DisplayName,
    p2.*
FROM dbo.Posts AS p
JOIN dbo.Posts AS p2
    ON p.Id = p2.Id
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;

SELECT TOP (1000)
    u.DisplayName,
    p2.*
FROM dbo.Posts AS p
JOIN dbo.Posts AS p2
    ON p.Id = p2.Id
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 6
ORDER BY p.CreationDate DESC;

In both cases, we get the same query plan shape, which is what we’re after:

Seek into the nonclustered index on Posts
Sort data by CreationDate
Join Posts to Users first
Join back to Posts for the select list columns

Because the Sort happens far earlier on in the plan, there’s less of a memory grant needed, and by quite a stretch from the 5+ GB before.

Thanks for reading!

Going Further

Recompile Hints Can’t Fix Every SQL Server Query Performance Problem

Posted on December 17, 2020May 15, 2022 by Erik Darling

Fast 1

After blogging recently (maybe?) about filters, there was a Stack Exchange question about a performance issue when a variable was declared with a max type.

After looking at it for a minute, I realized that I had never actually checked to see if a recompile hint would allow the optimizer more freedom when dealing with them.

CREATE INDEX u 
    ON dbo.Users(DisplayName);

DECLARE @d nvarchar(MAX) = N'Jon Skeet';

SELECT 
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
WHERE u.DisplayName = @d;

SELECT 
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
WHERE u.DisplayName = @d
OPTION(RECOMPILE);

Turns out that it won’t, which is surprising.

Even though both plans have sort of a weird seek, the filter operator remains as a weird sort of residual predicate.

Thanks for reading!

Going Further

A Parameterization Puzzle With TOP Follow-Up

Posted on December 16, 2020May 15, 2022 by Erik Darling

Spell It Out

Back in October, I had written a couple posts about how parameterizing TOP can cause performance issues:

Anyway, I got back to thinking about it recently because a couple things had jogged in my foggy brain around table valued functions and parameter sniffing.

Go figure.

Reading Rainbow

One technique you could use to avoid this would be to use an inline table valued function, like so:

CREATE OR ALTER FUNCTION dbo.TopParam(@Top bigint)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
SELECT TOP (@Top)
    u.DisplayName,
    b.Name
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT TOP (1)
        b.Name
    FROM dbo.Badges AS b
    WHERE b.UserId = u.Id
    ORDER BY b.Date DESC
) AS b
WHERE u.Reputation > 10000
ORDER BY u.Reputation DESC;
GO

When we select from the function, the top parameter is interpreted as a literal.

SELECT 
    tp.*
FROM dbo.TopParam(1) AS tp;

SELECT 
    tp.*
FROM dbo.TopParam(38) AS tp;

Performance is “fine” for both in that neither one takes over a minute to run. Good good.

Departures

This is, of course, not what happens in a stored procedure or parameterized dynamic SQL.

EXEC dbo.ParameterTop @Top = 1;

Keen observers will note that this query runs for 1.2 seconds, just like the plan for the function above.

That is, of course, because this is the stored procedure’s first execution. The @Top parameter has been sniffed, and things have been optimized for the sniffed value.

If we turn around and execute it for 38 rows right after, we’ll get the “fine” performance noted above.

EXEC dbo.ParameterTop @Top = 38;

Looking at the plan in a slightly different way, here’s what the Top operator is telling us, along with what the compile and runtime values in the plan are:

It may make sense to make an effort to cache a plan with @Top = 1 initially to get the “fine” performance. That estimate is good enough to get us back to sending the buffers quickly.

Buggers

Unfortunately, putting the inline table valued function inside the stored procedure doesn’t offer us any benefit.

Without belaboring the point too much:

CREATE PROCEDURE dbo.ParameterTopItvf(@Top BIGINT)  
AS  
BEGIN  
    SET NOCOUNT, XACT_ABORT ON;  
  
    SELECT   
        tp.*  
    FROM dbo.TopParam(@Top) AS tp;  
  
END;  
GO 

EXEC dbo.ParameterTopItvf @Top = 1;

EXEC dbo.ParameterTopItvf @Top = 38;

EXEC sp_recompile 'dbo.ParameterTopItvf';

EXEC dbo.ParameterTopItvf @Top = 38;

EXEC dbo.ParameterTopItvf @Top = 1;

If we do this, running for 1 first gives us “fine” performance, but running for 38 first gives us the much worse performance.

Thanks for reading!

Going Further

An Edge Case When Working With Date Parameters In SQL Server

Posted on November 26, 2020May 16, 2022 by Erik Darling

Wheeze Man

When people tell you that working with correct data types is important, it’s for a variety of very good reasons.

Not only can you avoid performance issues, but you can avoid strange query plan distractions, too.

Let’s look at an example for when you use date parameters against datetime columns.

Wrong And Mean

Index from outta nowhere pow!

CREATE INDEX pe ON dbo.Posts(LastEditDate);

The important thing about the LastEditDate column in the Posts table is that it’s nullable.

Not all posts will get edited. Especially mine. They’re always correct the first time.

Basically read only, if we’re being honest about things.

Or maybe it’s about 50/50.

If your columns aren’t nullable, you’ll run into far fewer problems and ambiguities.

I’d like a new data type called ABYSS. Or maybe VOID.

The Problem: Wrong Data Type And NULL Checks

DECLARE @d date = '20170601';
DECLARE @sql nvarchar(MAX) = N'
SELECT
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.LastEditDate > @d
AND   p.LastEditDate IS NOT NULL;'

EXEC sp_executesql @sql, 
                   N'@d date', 
                   @d;
GO

If we pass in a parameter that has a date datatype, rather than date time, an odd thing will happen if we add in a redundant IS NOT NULL check.

The seek predicate will only seek to the first non-NULL value, rather than immediately to the start of the range of dates we care about, which means we end up reading a lot more rows than necessary.

Note the query runtime of 743 milliseconds, and that we end up reading quite a few more rows than we return.

And here I was told Seeks are always efficient ?

Solution One: Stop Checking For NULLs

If we either stop checking for NULLs, we’ll get around the issue.

DECLARE @d date = '20170601';
DECLARE @sql nvarchar(MAX) = N'
SELECT
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.LastEditDate > @d;'

EXEC sp_executesql @sql, 
                   N'@d date', 
                   @d;
GO

The plan for this query looks a bit different, but performance is no worse for the wear.

Note the 25 millisecond execution time. A clear improvement over the 743 milliseconds above. Though the query plan does look a bit odd.

The compute scalar gins up a date range, which is checked in the seek:

I wonder what Expr1002 is up to.

Solution Two: Just Use The Right Datatype To Begin With

In reality, this is what we should have done from the start, but the whole point of this here blog post is to show you what can happen when you Do The Wrong Thing™

When we use the right datatype, we get a simple plan that executes quickly, regardless of the redundant NULL check.

DECLARE @d date = '20170601';
DECLARE @sql nvarchar(MAX) = N'
SELECT
   COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.LastEditDate > @d
AND   p.LastEditDate IS NOT NULL;'

EXEC sp_executesql @sql, 
                   N'@d datetime', 
                   @d;

Here, the NULL check is a residual predicate rather than the Seek predicate, which results in a seek that really seeks instead of just meandering past some NULLs.

Thanks for reading!

Going Further

A SQL Server Parameterization Puzzle With TOP: Part 2

Posted on October 22, 2020May 16, 2022 by Erik Darling

Up From The Floor

Now that we’ve covered what happened with our query, how can we fix it?

Remember when I said that this only happens with literals? I sort of lied.

Sorry.

Pants On Fire

Probably the simplest thing to do would be to set MAXDOP to 1 for the query. Avoiding the parallel exchanges avoids the problem, but the query does run longer than the original with a literal TOP. That being said, it may be the simplest solution in some cases for you if it stabilizes performance.

If you’re feeling as brave as Sir Robin, you can add an OPTIMIZE FOR hint to bring back the early-buffer-send behavior.

CREATE OR ALTER PROCEDURE dbo.SniffedTop (@Top INT)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT TOP (@Top)
            u.DisplayName,
            b.Name
    FROM dbo.Users u
    CROSS APPLY 
    (
        SELECT TOP (1) 
                b.Name
        FROM dbo.Badges AS b
        WHERE b.UserId = u.Id
        ORDER BY b.Date DESC
    ) AS b
    WHERE u.Reputation >= 10000
    ORDER BY u.Reputation DESC
    OPTION(OPTIMIZE FOR(@Top = 1));

END;
GO

Are they always better? I have no idea, but if you’ve got long-running queries with a parameterized TOP, this might be something worth experimenting with.

Another rewrite that works is slightly more complicated. Though for maximum benefit, Batch Mode is necessary.

CREATE OR ALTER PROCEDURE dbo.SniffedTop (@Top INT)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT TOP (@Top)
            u.DisplayName,
            b.Name
    FROM dbo.Users u
    CROSS APPLY 
    (
        SELECT *
        FROM 
        (
            SELECT *,
            	ROW_NUMBER() 
            	    OVER( PARTITION BY b.UserId
            		      ORDER BY b.Date DESC ) AS n
            FROM dbo.Badges AS b
        ) AS b
        WHERE b.UserId = u.Id
        AND b.n = 1
    ) AS b
    WHERE u.Reputation >= 10000
    ORDER BY u.Reputation DESC;

END;
GO

So that’s fun. We’re having fun. I like fun.

I’m gonna make a PowerPoint about fun.

Other Things, And Drawbacks

So like, you could add a recompile hint to allow the TOP parameter to be sniffed, sure. But then you’re sort of undoing the good you did parameterizing in the first place.

You could also write unparameterized dynamic SQL, but see above. Same problem, plus a blown out plan cache if people ask for different values.

Optimize for unknown, and OFFSET/FETCH also don’t work.

Of course, one thing that would help here is a more appropriate index leading with UserId. However, most good demos arise from less than ideal indexing, so you’re just going to have to deal with it.

Thanks for reading!

Going Further

A SQL Server Parameterization Puzzle With TOP: Part 1

Posted on October 21, 2020May 16, 2022 by Erik Darling

The Gift That Keeps On Giving

I love when a demo written for one purpose turns into an even better demo for another purpose.

While working with a client recently, they ran into a performance issue when trying to promote plan reuse by parameterizing the user-input number for TOP.

In part 1, I’m going to show you what happened and why, and in part 2 I’ll discuss some workarounds.

Regresso Chicken Face Soup

When executed with a literal value in the top, this query runs for around 10 seconds.

I’m not saying that’s great, but it’s a good enough starting place.

SELECT TOP (38)
        u.DisplayName,
        b.Name
FROM dbo.Users u
CROSS APPLY 
(
    SELECT TOP (1) 
            b.Name
    FROM dbo.Badges AS b
    WHERE b.UserId = u.Id
    ORDER BY b.Date DESC
) AS b
WHERE u.Reputation >= 10000
ORDER BY u.Reputation DESC;

If we take that same query, put it in a proc, and run it with an identical value in the TOP, things will turn out not-so-well.

CREATE OR ALTER PROCEDURE dbo.SniffedTop (@Top INT)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT TOP (@Top)
            u.DisplayName,
            b.Name
    FROM dbo.Users u
    CROSS APPLY 
    (
        SELECT TOP (1) 
                b.Name
        FROM dbo.Badges AS b
        WHERE b.UserId = u.Id
        ORDER BY b.Date DESC
    ) AS b
    WHERE u.Reputation >= 10000
    ORDER BY u.Reputation DESC;

END
GO 

EXEC dbo.SniffedTop @Top = 38;

The query runs for a significantly longer amount of time.

What Happened?

Unofficially, when TOP uses a constant and the constant is “small” (under 101), the exchange packets are allowed to send earlier than normal, as long as the exchange is below the TOP operator. They’re allowed to send as soon as they have a row, rather than waiting for them to fill up completely.

This can only happen with constants (or…!), and the behavior is true going back to 2005. It may change in the future, so if you’re reading this at some far off date, please don’t be too harsh on me in the comments.

When you parameterize TOP, it’s considered unsafe to send the exchange buffers early. After all, you could stick anything in there, up through the BIGINT max. In cases where you’ve got a BIG TOP, sending, say, 9,223,372,036,854,775,807 rows one at a time would be significantly ickier than sending over a smaller number of full exchange buffers.

If you’re surprised to hear that parallel exchange buffers can send at different times, you’re not alone. I was also surprised.

SQL Server: Full Of Surprises. Horrible surprises.

In the second query, where exchange buffers are sent when full, we spend a lot longer waiting for them to fill up. This isn’t exposed anywhere in the plan, and you’d need either a debugger or this blog post to figure it out.

Yep.

Yep. Yep. Yep. Yep. Yep. Yep.

Thanks for reading!

Going Further

Starting SQL: Unparameterized Strings In Dynamic SQL

Posted on September 30, 2020May 16, 2022 by Erik Darling

Roaster

It doesn’t take much to let a bad person get at your data. I mean, the internet. Right? What a mistake.

Most of the time, you need to parameterize your code to avoid SQL injection. Hooray, we did it.

But there are some aspects to dynamic SQL where you can’t use parameters, like database, schema, table and column names. Worse, if you need to pass in or build a list of columns, you can’t possibly parameterize each one.

These strings are where the biggest potential for problems lies, though. They’ll often be declared are some long variable length to safeguard against string truncation, which leaves plenty of room to tack on whatever additional payload you want the query to execute.

Not Dropping, Dumping

When it comes to permissions, it’s somewhat easier to disallow an app login from dropping databases or tables. Of course, it’s a bit of a joke expecting that sort of granularity from most applications.

They’ll all wanna do something that requires that privilege occasionally, so will be granted the privilege perpetually.

What a nice time that is.

But let’s face it, dropping things is at best a temporary disruption. Stealing data is forever, especially if you can just keep stealing it. For example, this query can be used to get the results of what’s in sys.tables along with regular query results. It doesn’t take a rocket surgeon to figure out how things line up.

DECLARE @SQLString NVARCHAR(MAX) = N'';
DECLARE @Filter NVARCHAR(MAX) = N'';
DECLARE @nl NCHAR(2) = NCHAR(13);
DECLARE @Title NVARCHAR(250) =  N''' 
UNION ALL 
SELECT t.object_id, 
       t.schema_id, 
       t.name, 
       SCHEMA_NAME(t.schema_id),
       t.create_date,
       t.modify_date,
       NULL
FROM sys.tables AS t --';

SET @SQLString = @SQLString + 
N'
SELECT TOP (5000) 
    p.OwnerUserId,
    p.Score,
    p.Tags,
    p.Title, 
    p.CreationDate, 
    p.LastActivityDate, 
    p.Body
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656 ' + @nl;

IF @Title IS NOT NULL 
   BEGIN SET @Filter = @Filter + N'   AND p.Title LIKE ''' + N'%' + @Title + N'%''' + @nl; END;

IF @Filter IS NOT NULL
   BEGIN SET @SQLString += @Filter; END;

SET @SQLString += N'   ORDER BY p.Score DESC;';

PRINT @SQLString;
EXEC (@SQLString);

And of course, after sys.tables you have sys.columns, and once you know which columns are in which table you want to go after, the rest is soft cheese.

Zero Injection Policy

If we wanted to not have that happen, we could write our SQL like this instead:

DECLARE @SQLString NVARCHAR(MAX) = N'';
DECLARE @Filter NVARCHAR(MAX) = N'';
DECLARE @nl NCHAR(2) = NCHAR(13);
DECLARE @Title NVARCHAR(250) =  N''' 
UNION ALL 
SELECT t.object_id, 
       t.schema_id, 
       t.name, 
       SCHEMA_NAME(t.schema_id),
       t.create_date,
       t.modify_date,
       NULL
FROM sys.tables AS t --';

SET @SQLString = @SQLString + 
N'
SELECT TOP (5000) 
    p.OwnerUserId,
    p.Score,
    p.Tags,
    p.Title, 
    p.CreationDate, 
    p.LastActivityDate, 
    p.Body
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656 ' + @nl;

IF @Title IS NOT NULL 
   BEGIN SET @Filter = @Filter + N'   AND p.Title LIKE %@Title% ' + @nl; END;

IF @Filter IS NOT NULL
   BEGIN SET @SQLString += @Filter; END;

SET @SQLString += N'   ORDER BY p.Score DESC;';

PRINT @SQLString;
EXEC sys.sp_executesql @SQLString, 
                       N'@Title NVARCHAR(250)',
                       @Title;

The difference in output is quite apparent:

But What About!

Yes, all those object names. Terrible about them, huh?

Of course, we have QUOTENAME to save us from those, and examples of it aren’t terribly interesting. This time, you get a link to the docs page.

Thanks for reading!