Why Comparing Columns Doesn’t Always Seek

Adjoining


When writing queries, sometimes you have the pleasure of being able to pass a literal value, parameter,  or scalar expression as a predicate.

With a suitable index in place, any one of them can seek appropriately to the row(s) you care about.

But what about when you need to compare the contents of one column to another?

It gets a little bit more complicated.

All About Algorithms


Take this query to start with, joining Users to Posts.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id;

The OwnerUserId column doesn’t have an index on it, but the Id column on Users it the primary key and the clustered index.

But the type of join that’s chosen is Hash, and since there’s no where clause, there’s no predicate to apply to either table for filtering.

jumbo

This is complicated slightly by the Bitmap, which is created on the OwnerUserId column from the Posts table and applied to the Id column from the Users table as an early filter.

However, it’s a useless Bitmap, and Bitmaps don’t really seek anyway.

The same pattern can generally be observed with Merge Joins.  Where things are a bit different is with Nested Loops.

Shoop Da Loop


If we use a query hint, we can see what would happen with a Nested Loops Join.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id
OPTION(LOOP JOIN);

The plan looks like this now, with a Seek on the Users table.

petting motions

The reason is that this flavor of Nested Loops, known as Apply Nested Loops, takes each row from the outer input and uses it as a scalar operator on the inner input.

An example of Regular Joe Nested Loops™ looks like this:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.Reputation = 
(
    SELECT
        MIN(p.Score)
    FROM dbo.Posts AS p
);

Where the predicate is applied at the Nested Loops operator:

and bert

Like most things, indexing is key, but there are limits.

Innermost


Let’s create this index:

CREATE INDEX ud ON dbo.Users(UpVotes, DownVotes);

And run this query:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.UpVotes = u.DownVotes;

The resulting query plan looks like this:

did a cuss

But what other choice is there? If we want a seek, we need a particular thing or things to seek to.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.UpVotes = u.DownVotes
AND   u.UpVotes = 1;
name game

We seek to everyone with an UpVote of 1, and then somewhat awkwardly search the DownVotes column for values >= 1 and <= 1.

But again, these are specific values we can search for.

Thanks for reading!

 



One thought on “Why Comparing Columns Doesn’t Always Seek

  1. That involves more complexity to the query optimizer: in few ms it should be able to help producing a “good estimated” query.
    That is because the prior requirement: spend less time as mach as you can, because the client is waiting for results.
    Interesting idea… what about a post execution re-optimization occurring when there is less work to do (or a separate process having this job)?

Leave a Reply

Your email address will not be published. Required fields are marked *