Partitioning and Parallelism: Seeking Redemption

Facing Up


Yesterday we looked at how parallel scans grab rows from partitioned tables. All of those queries were performing full scans of the Votes table, and in some cases threads were given no work to do. That could look troubling in a parallel query, because we’ve given worker threads to this query and they appear to be doing nothing.

Despite my oft frustration with Repartition Streams, it can come in quite handy, especially here.

In today’s experiment, we’re going to look at how plans running at different DOPs can be faster when seeking to partitions.

I’ll let you decide if these queries are being run on Standard Edition, or Developer/Enterprise Edition.

The setup


Because I want to test seeking into partitions, and a potentially more realistic query scenario with a couple joins, the query form is changing a little bit.

SELECT
    DATEPART(YEAR, vp.CreationDate) AS VoteYear,
    DATEPART(MONTH, vp.CreationDate) AS VoteMonth,
    COUNT_BIG(DISTINCT vp.PostId) AS UniquePostVotes,
    SUM(vp.BountyAmount) AS TotalBounties
FROM dbo.Votes16_p AS vp
JOIN dbo.Comments AS c
    ON c.PostId = vp.PostId
JOIN dbo.Users AS u
    ON u.Id = c.UserId
WHERE vp.CreationDate BETWEEN '2013-01-01 00:00:00.000'
                          AND '2013-12-31 00:00:00.000'
GROUP BY
    DATEPART(YEAR, vp.CreationDate),
    DATEPART(MONTH, vp.CreationDate)

Very realistically bad, that. You people.

The way this is written, the query will access two partitions that contain data for the year 2013.

CREATE PARTITION FUNCTION VoteYear16_pf(DATETIME)
    AS RANGE RIGHT FOR VALUES
    (
        '20080101',
        '20080601',
        '20090101',
        '20090601',	    
        '20100101',
        '20100601',
        '20110101',
        '20110601',
        '20120101',
        '20120601',
        '20130101',
        '20130601',
        '20140101',
        '20140601'
    );
GO

At DOP 4


All four threads seeking into the Votes table get rows to work on, and the entire thing runs for close enough to 14 seconds for me not to care.

buggy
splitted

If one were to find themselves in the midst of having nothing to do, one might find that two threads each got groovy with two partitions.

At DOP 8


Things get a little awkward.

🤡

We end up with three threads that don’t do anything, sort of like in yesterday’s post when we had to touch empty partitions.

Despite the seek into the Votes table here taking about 350ms longer, the query overall runs about 5 seconds faster.

Fortunately, there’s a helpful repartition streams after the index seek that rebalances rows on threads.

thanks, robot

I’m not adding all those up for you.

At DOP 16


Cha-ching, etc.

dopophobe

Eight threads get work, and eight don’t. The seek now takes a touch longer, but the query itself now finishes in 7.6 seconds.

Just like above, a repartition streams after the seek evens out row distributions.

Thinkin’ Tree


Even though higher DOPs are technically less helpful seeking into the Votes table, they obviously have some benefit to other operations in the query plan.

Whether or not it’s always worth the sacrifice takes some testing, and it might change based on how many partitions you’re touching.

Don’t be too concerned about lopsided parallelism at the seek or scan, as long as you have a repartition streams that adequately rebalances things afterwards.

But as people not from Australia are fond of saying, you should never run queries above DOP 8 anyway 😃

Thanks for reading!



Leave a Reply

Your email address will not be published. Required fields are marked *