Different Ways To Get The Highest Value Part 1

Expresso


Let’s say you wanna get the highest thing. That’s easy enough as a concept.

Now let’s say you need to get the highest thing¬†per user. That’s also easy enough to visualize.

There are a bunch of different ways to choose from to write it.

In this post, we’re going to use four ways I could think of pretty quickly, and look at how they run.

The catch for this post is that we don’t have any very helpful indexes. In other posts, we’ll look at different index strategies.

Query #1

To make things equal, I’m using CROSS APPLY in all of them.

The optimizer is free to choose how to interpret this, so WHATEVER.

SELECT u.Id,
       u.DisplayName,
	   u.Reputation,
	   ca.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT MAX(Score) AS Score
	FROM dbo.Posts AS p
	WHERE p.OwnerUserId = u.Id

) AS ca
WHERE u.Reputation >= 100000
ORDER BY u.Id;

The query plan is simple enough, and it runs for ~17 seconds.

Big hitter

Query #2

This uses TOP 1.

SELECT u.Id,
       u.DisplayName,
	   u.Reputation,
	   ca.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT TOP (1) p.Score
	FROM dbo.Posts AS p
	WHERE p.OwnerUserId = u.Id
	ORDER BY p.Score DESC

) AS ca
WHERE u.Reputation >= 100000
ORDER BY u.Id;

The plan for this is also simple, but runs for 1:42, and has one of those index spool things in it.

Unlucky

Query #3

This query uses row number rather than top 1, but has almost the same plan and time as above.

SELECT u.Id,
       u.DisplayName,
	   u.Reputation,
	   ca.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT p.Score,
	       ROW_NUMBER() OVER (ORDER BY p.Score DESC) AS n
	FROM dbo.Posts AS p
	WHERE p.OwnerUserId = u.Id
) AS ca
WHERE u.Reputation >= 100000
AND ca.n = 1
ORDER BY u.Id;
Why send me silly notes?

Query #4

Also uses row number, but the syntax is a bit more complicated.

The row number happens in a derived table inside the cross apply, with the correlation and filtering done outside.

SELECT u.Id,
       u.DisplayName,
	   u.Reputation,
	   ca.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT * 
	FROM 
	(
        SELECT p.OwnerUserId,
	           p.Score,
	           ROW_NUMBER() OVER (PARTITION BY p.OwnerUserId 
			                      ORDER BY p.Score DESC) AS n
	    FROM dbo.Posts AS p
	) AS p
	WHERE p.OwnerUserId = u.Id
	AND p.n = 1
) AS ca
WHERE u.Reputation >= 100000
ORDER BY u.Id;

This is as close to competitive with Query #1 as we get, at only 36 seconds.

That’s a lot of writing.

Wrap Up


If you don’t have helpful indexes, the MAX pattern looks to be the best.

Granted, there may be differences depending on how selective data in the table you’re aggregating is.

But the bottom line is that in that plan, SQL Server doesn’t have to Sort any data, and is able to take advantage of a couple aggregations (partial and full).

It also doesn’t spend any time building an index to help that one.

In the next couple posts, we’ll look at different ways to index for queries like this.

Thanks for reading!



One thought on “Different Ways To Get The Highest Value Part 1

Leave a Reply

Your email address will not be published. Required fields are marked *