Not In And Nullables

What This Isn’t

I’m not writing about how if you use NOT IN, and suffer from NULL values in a column you’re comparing, you’ll get an empty result.

That’s easy enough to show:

What’s sort of more interesting to me is what happens in execution plans when you’re comparing NULL-able columns that don’t contain any NULLs.

Exampled!

Here’s the first query I want to show you. In the Posts table, the OwnerUserId column is NULLable, but doesn’t have any NULLs in it. The Id column in the Users table is not NULLable — it’s the PK/CX on the table.

Note that for all these queries, I’ve created these indexes:

The important part of the query looks like this:

Good thing I have great indexes.

Zooming in to a sorta weird part of the query:

I’m in it

The optimizer spins up a Row Count Spool to make sure it’s right about the lack of NULLs. You can see the seek predicate doing so, and finding 0 rows, but taking ~200ms to do it. This time was much worse without those great indexes.

Explicitly Not NULL

If we change our query slightly, we can get a less exotic query plan:

Not Null’em

Which is a little bit faster overall, at ~350ms vs 550ms.

This is the equivalent of writing your query the way you should have written it in the first place, using NOT EXISTS.

Recap

The optimizer is a weird place. It’s the ultimate defensive driver. Dealing with NULLs can be quite odd and difficult.

I generally save IN and NOT IN for when I have short lists of literal values. Likely, EXISTS/NOT EXISTS when values are contained in other tables, or dumping values into a #temp table when there’s a lot of values can be more efficient, if only to make better indexing available.

While IN doesn’t have the same side effects as NOT IN with NULLs, I think using EXISTS instead is just a good habit.

Thanks for reading!

Leave a Reply

Your email address will not be published. Required fields are marked *