Dealing With SQL Server Performance Problems From Unparameterized IN Clauses From Entity Framework

Seriously


If you’re using Entity Framework, and sending in queries that build up IN clauses, they won’t end up getting parameterized.

Even Forced Parameterization won’t help you if you’re sending in other parameters. One limitation is that it doesn’t kick in for partially parameterized queries.

Even if they did get parameterized, well, what’s one funny thing about IN clauses? You don’t know how many values will be in them.

You’d end up with a different plan for every variation in the number of values, though at least you’d get plan reuse within those groups.

Griper


Like most people, I use Twitter to gripe and brag. So here we are:

Yes, I’m surprised that a code base that has been around for as long as Entity Framework hasn’t already dealt with this problem. I’ve said it before: someone ought to introduce the Entity Framework team to the SQL Server team.

But thankfully, some people use Twitter to be helpful, like ErikEJ (b|t|g), who has a solution that works with EF Core.

Alt Code


My friend Josh (b|t) helped me out with some code that works in non-Core versions of Entity Framework too.

using (var context = new StackOverflowContext())
    {
    context.Database.Log = Console.WriteLine;
    // http://www.albahari.com/nutshell/predicatebuilder.aspx
    var predicate = PredicateBuilder.False<User>();

    for (int i = 0; i < 100; i++)
    {
        var value = userIds[i >= userIds.Count - 1 ? userIds.Count - 1 : i];
        predicate = predicate.Or(u => u.Id == value);
    }

    var users = context.Users
        .AsExpandable() // http://www.albahari.com/nutshell/linqkit.aspx
        .Where(predicate)
        .ToList();

This is helpful when you have an upper limit to the number of values that could end up in your IN clause. This is cool because you’ll always generate 20 parameters, and pad out the list with the last value. That means you’ll get one query plan regardless of how many parameters actually end up in there.

I do not suggest setting this to an arbitrarily high number as a catch all. You will not be happy.

If you don’t have a known number, you can use this:

using (var context = new StackOverflowContext())
   {
   // http://www.albahari.com/nutshell/predicatebuilder.aspx
   var predicate = PredicateBuilder.False<User>();

   foreach (var id in userIds)
   {
       predicate = predicate.Or(u => u.Id == id);
   }

   var users = context.Users
       .AsExpandable() // http://www.albahari.com/nutshell/linqkit.aspx
       .Where(predicate)
       .ToList();

And of course, even though it’s probably better than no parameterization at all, you will still get different query plans for different numbers of values.

And did I mention that Josh has a Great Series Of Posts™ on bad EF practices?

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.



4 thoughts on “Dealing With SQL Server Performance Problems From Unparameterized IN Clauses From Entity Framework

  1. I had a similar (but a bit different) issue with SAP software that was generating IN clauses that were so long that the queries were failing due the length. For example: SELECT CustomerID, SalesOrderID, [SubTotal], [TotalDue] FROM Sales.SalesOrderHeader WHERE CustomerID IN ({lists over 10,0000}). This was throwing Event ID 8623 errors and the queries never ran. The solution was to generate a temporary table and use a join instead.

    Would parameterization work any better with a temp table and join solution?

Comments are closed.