SQL Server Scalar UDF Inlining And Security Functions

You’ve Got No Security


Despite being a bouncer for many years, I have no interest at all in security.

Users, logins, roles, grant, deny. Not for me. I’ve seen those posters, and they’re terrifying.

Gimme 3000 lines of dynamic SQL any day.

This is a slightly different take on yesterday’s post, which is also a common problem I see in queries today.

Someone wrote a function to figure out if a user is trusted, or has the right permissions, and sticks it in a predicate — it could be a join or where clause.

High Finance


Stack Overflow isn’t exactly a big four accounting firm, but for some reason big four accounting firms don’t make their databases public under Creative Commons licensing.

So uh. Here we are.

And here’s our query.

DECLARE @UserId INT = 22656, --2788872, 22656
		@SQL NVARCHAR(MAX) = N'';

SET @SQL = @SQL + N'
SELECT    p.Id,
          p.AcceptedAnswerId,
          p.AnswerCount,
          p.CommentCount,
          p.CreationDate,
          p.FavoriteCount,
          p.LastActivityDate,
          p.OwnerUserId,
          p.Score,
          p.ViewCount,
          v.BountyAmount,
          c.Score
FROM      dbo.Posts AS p
LEFT JOIN dbo.Votes AS v
    ON p.Id = v.PostId
	AND dbo.isTrusted(@iUserId) = 1
LEFT JOIN dbo.Comments AS c
    ON p.Id = c.PostId
WHERE     p.PostTypeId = 5;
';

EXEC sys.sp_executesql @SQL,
                       N'@iUserId INT',
					   @iUserId = @UserId;

There’s a function in that join to the Votes table. This is what it looks like.

CREATE OR ALTER FUNCTION dbo.isTrusted ( @UserId INT )
RETURNS BIT
WITH SCHEMABINDING, RETURNS NULL ON NULL INPUT
AS
    BEGIN
        DECLARE @Bitty BIT;

        SELECT @Bitty = CASE WHEN u.Reputation >= 10000 
		                     THEN 1 
							 ELSE 0 
					    END
        FROM   dbo.Users AS u
        WHERE  u.Id = @UserId;
        
		RETURN @Bitty;

    END;
GO

Bankrupt


There’s not a lot of importance in the indexes, query plans, or reads.

What’s great about this is that you don’t need to do a lot of analysis — we can look purely at runtimes.

It also doesn’t matter if we run the query for a trusted (22656) or untrusted (2788872) user.

In compat level 140, the runtimes look like this:

ALTER DATABASE StackOverflow2013 SET COMPATIBILITY_LEVEL = 140;

 SQL Server Execution Times:
   CPU time = 7219 ms,  elapsed time = 9925 ms.

 SQL Server Execution Times:
   CPU time = 7234 ms,  elapsed time = 9903 ms.

In compat level 150, the runtimes look like this:

ALTER DATABASE StackOverflow2013 SET COMPATIBILITY_LEVEL = 150;

 SQL Server Execution Times:
   CPU time = 2734 ms,  elapsed time = 781 ms.

 SQL Server Execution Times:
   CPU time = 188 ms,  elapsed time = 142 ms.

In both runs, the trusted user is first, and the untrusted user is second.

Sure, the trusted user query ran half a second longer, but that’s because it actually had to produce data in the join.

One important thing to note is that the query was able to take advantage of parallelism when it should have (CPU time is higher than elapsed time).

In older versions (or even lower compat levels), scalar valued functions would inhibit parallelism. Now they don’t when they’re inlined.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.



4 thoughts on “SQL Server Scalar UDF Inlining And Security Functions

  1. I tried Scalar UDF Inlining in a real world situation. We had a function which modifies a column, trimming it, capitalising it and replacing any double spaces with single spaces. Tested it using a similar situation you tested and got similar results you did.

    Then I put it into the real query. Of course the real query has around 40 columns, 25 or so are columns that we want to call the function on. Tested the performance and…. well, because we were doing it on 25 columns and not one, it made performance considerably worse. Query time to execute went from 1h40m to 2h42m and the plans of course looked a lot different, the Scalar Inlined version was horrific.

    Which leads me to conclude that you can’t actually use this feature in the real world. Because that function that I talked about will considerably speed up processing if it’s used singularly in a list of SELECTs. But use it multiple times in your SELECT and there will be a threshold where it will start performing worse than having the UDF executed in the traditional way. But can you set the UDF to be inlined in one query but not another? As far as I can see, you can only set it to be on or off at the function declaration level or the database level. Hence you cannot say “Use it when I am using the function once, but not when it’s used 15 times”.

    1. The phrase that’s sticking out to me here is “Which leads me to conclude that you can’t actually use this feature in the real world”.

      It’s sticking out because the real world means different things to different people, and its meaning is usually based on just their very specific circumstances. This seems to hold true for you, too.

      The real world you’ve created, where you wrote a function to trim, capitalize, and replace double spaces, and you use that to modify 40 columns, is a terrible one. That’s your fault, and that’s not the kind of thing UDF inlining is supposed to fix. Nor is it the kind of process I’d use to gauge the feature overall. It’s like you saying we shouldn’t plant trees on earth because trees won’t help mars.

      You’ve gone out of your way to ensconce yourself in a bad idea and terrible practice. It’s not the feature that needs fixing, here.

      1. Got the same scenario in the past. Cleaning procedure that ran 2 hours nightly. I optimized it down to 10 minutes, and guess what they did with time they saved? Used this optimized procedure to clean other tables and it now runs for more than 2 hours every night and cannot be sped up. Using just a trigger to clean up data once when it’s inserted or updated and when it really requires cleaning would be an optimal solution, because the amount of inserts and updates was small. However customer stubbornly refused. I don’t care anymore. Earned my money and walked away.

Comments are closed.