Spill Levels Are Weird

Get Low


There’s a wonderful explanation from Paul White about spill levels here. At the risk of offending the copyright godses, I’m going to quote it here.

Consider the task of sorting 4000MB of data, when we only have 500MB of memory available. Obviously, we cannot sort the whole set in memory at once, but we can break the task down:

We first read 500MB of data, sort that set in memory, then write the result to disk. Performing this a total of 8 times consumes the entire 4000MB input, resulting in 8 sets of sorted data 500MB in size. The second step is to perform an 8-way merge of the sorted data sets. Note that a merge is required, not a simple concatenation of the sets since the data is only guaranteed to be sorted as required within a particular 500MB set at the intermediate stage.

Alright, interesting! That’d make me think that the number of pages involved in the spill would increase the spill level. In real life, I saw spill level 11 once. I wish I had that plan saved.

Rowed To Nowhere


Here’s a pretty big spill. But it’s only at level two.

Insert commas.

That’s 8,115,283,973 rows.

Here’s a much larger spill that’s still only at level two.

Insert more commas.

That’s um… hang on.

231,424,059,820 rows. It’s crazy that we can add 223,308,775,847 rows to a spill and not need more passes than before.

But hey, okay. That’s cool. I get it.

Waaaaaiiiiitttt

I found level three! With uh 97,142 pages. That’s weird. I don’t get it.

But whatever, it probably can’t get much weirder than

SPILL LEVEL 8 FOR 43,913 PAGES 1-800-COME-ON-YO!

This Is Gonna Take More Digging


I’m gonna have my eye on weird spill levels from now on, and I’ll try to follow up as I figure stuff out.

Thanks for reading!



One thought on “Spill Levels Are Weird

Leave a Reply

Your email address will not be published. Required fields are marked *