How String Data Can Hurt Query Performance In SQL Server

Mistakes!


Strings cause nothing but problems in databases.

  • Standard vs. Unicode
  • Odd collations
  • Inflated memory grants
  • Oversized columns because of truncation
  • String splitting

And of course, sometimes they can cause plans to be costed strangely.

Strong Tradition


Working with the queries we’ve been tinkering with in all the posts this week, let’s look at a slightly different oddity.

/*Q1*/
SELECT u.Id, 
       u.DisplayName, 
       u.Reputation, 
       ca.Id, 
       ca.Type, 
       ca.CreationDate,
       ca.Text --Text in the select list
FROM dbo.Users AS u
OUTER APPLY
(
SELECT c.Id,
       DENSE_RANK() 
           OVER ( PARTITION BY c.PostId 
                  ORDER BY c.Score DESC ) AS Type,
        c.CreationDate,
        c.Text
FROM dbo.Comments AS c
WHERE c.UserId = u.Id
) AS ca
WHERE ca.Type = 0;


/*Q2*/
SELECT u.Id, 
       u.DisplayName, 
       u.Reputation, 
       ca.Id, 
       ca.Type, 
       ca.CreationDate
       -- No Text in the select list
FROM dbo.Users AS u
OUTER APPLY
(
SELECT c.Id,
       DENSE_RANK() 
           OVER ( PARTITION BY c.PostId 
                  ORDER BY c.Score DESC ) AS Type,
        c.CreationDate,
        c.Text
FROM dbo.Comments AS c
WHERE c.UserId = u.Id
) AS ca
WHERE ca.Type = 0;

The first query has the Text column in the outer select list, and the second query doesn’t. Please read the comments for additional clarity.

Big Plans


The plan without Text in the outer project goes parallel, and the one with it does not.

SQL Server Query Plan
pain pain

But why?

Forcing The Issue


Let’s add a third query into the mix to force the query to go parallel.

/*Q3*/
SELECT u.Id, 
       u.DisplayName, 
       u.Reputation, 
       ca.Id, 
       ca.Type, 
       ca.CreationDate,
       ca.Text --Text in the select list
FROM dbo.Users AS u
OUTER APPLY
(
SELECT c.Id,
       DENSE_RANK() 
           OVER ( PARTITION BY c.PostId 
                  ORDER BY c.Score DESC ) AS Type,
        c.CreationDate,
        c.Text
FROM dbo.Comments AS c
WHERE c.UserId = u.Id
) AS ca
WHERE ca.Type = 0
OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'));

Things are pretty interesting, here.

SQL Server Query Plan
caloric surplus

The parallel plan is actually costed cheaper up through the Filter operator. In the serial plan, the entire subtree costs 35,954. In the parallel plan, it’s at 35,719.

At 200 query bucks cheaper, we’re in good shape! And then… We Gather Streams ☹

SQL Server Query Plan
creaky french

Mortem


The Gather Streams pushes the final plan cost for the parallel plan up higher than the serial plan.

Even though the parallel plan finishes ~26 seconds faster, the optimizer doesn’t choose it naturally because it is a cheapskate.

Bummer, huh?

An important point to keep in mind is that in nested loops join plans, the inner side of the query doesn’t receive any cost adjustments for parallel vs. serial versions. All of the costing differences will exist on the outside.

That’s why only the last few operators in the plan here are what makes a difference.

And that’s what we’ll finish out the week with!

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.



2 thoughts on “How String Data Can Hurt Query Performance In SQL Server

  1. I’ve tried to change the text column;
    left(c.Text,1) as text

    It helps, but not by much.

    1. When you say that it helps but not by much, what do you mean?

      Putting that, or CONVERT(NVARCHAR(1), c.Text) AS Text into the select list of the outer apply results in a parallel plan.

Comments are closed.