Greatest, Least, and SQL Server

Spinning Out Of Control


It’s sorta kinda pretty crazy when every major database platform has something implemented, and SQL Server doesn’t.

Geez, even MySQL.

But a fairly common need in databases is to find the max value from two columns.

Maybe even across two tables.

Track V


For one table, it’s fairly straight forward.

SELECT     MAX(x.CombinedDate) AS greatest
FROM       dbo.Users AS u
CROSS APPLY( VALUES( u.CreationDate ), ( u.LastAccessDate )) AS x( CombinedDate );

We’re using our old friend cross apply with a values clause to create on “virtual” column from two date columns.

As far as indexing goes, I couldn’t find any performance difference between these two. They both take about 1 second.

CREATE INDEX smoochies ON dbo.Users(CreationDate, LastAccessDate);
CREATE INDEX woochies ON dbo.Users(LastAccessDate, CreationDate);

Indexing strategy will likely rely on other local factors, like any where clause filtering.

Monolith


A similar pattern will work across two tables:

SELECT     MAX(x.Score)
FROM       dbo.Posts AS p
JOIN       dbo.Comments AS c
    ON p.Id = c.PostId
CROSS APPLY( VALUES( p.Score ), ( c.Score )) AS x ( Score );

Though this is the worst possible way to write the query. It runs for around 10 seconds.

The indexes I have for this query look like so:

CREATE INDEX thicc ON dbo.Posts(Id, Score);
CREATE INDEX milky ON dbo.Comments(PostId, Score);

Reversing the key column order helps — the query runs in about 3 seconds, but I need to force index usage.

Of course, this is still the second worst way to write this query.

The best way I’ve found to express this query looks like so:

SELECT MAX(x.Score)
FROM
    (
     SELECT MAX(p.Score) AS Score
     FROM dbo.Posts AS p
    ) AS p
CROSS JOIN
    (
     SELECT MAX(c.Score) AS Score
     FROM dbo.Comments AS c
    ) AS c
CROSS APPLY( VALUES( p.Score ), ( c.Score )) AS x( Score );

The cross join here isn’t harmful because we only produce two rows.

And it finishes before we have time to move the mouse.

Mousey

Likewise, the faster pattern for a single table looks like this:

SELECT MAX(x.Dates)
FROM
    (
     SELECT MAX(u.CreationDate) CreationDate
     FROM dbo.Users AS u
    ) AS uc
CROSS JOIN
    (
     SELECT MAX(u.LastAccessDate) LastAccessDate
     FROM dbo.Users AS u
    ) AS ul
CROSS APPLY (VALUES (uc.CreationDate), (ul.LastAccessDate)) AS x (Dates);

Because we’re able to index for each MAX

CREATE INDEX smoochies ON dbo.Users(CreationDate);
CREATE INDEX woochies ON dbo.Users(LastAccessDate);

Of course, not every query can be written like this, or indexed for perfectly, but it’s gruel for thought if you need specific queries like this to be as fast as possible.

Thanks for reading!

A Quick Primer On Date Math

Come Up


I’m going to be writing a few posts about things I see people doing with date math, or as I’ve started calling it, “dath*”, that can really hurt performance.

To get you ready for those, I need to talk about how functions like DATEADD and DATEDIFF work. It seems like that’s missing knowledge in many places.

Make Myself A Winner


As of this writing, if I run this query, I get these results:

SELECT SYSDATETIME() AS sys_date;

sys_date
2019-11-08 17:22:10.6886473

It has a date part, and a time part. Right? Cool.

The first thing I need to show you is that with the DATEADD and DATEDIFF functions, some dates and numbers are interchangeable.

For instance:

SELECT DATEDIFF(DAY, 0, SYSDATETIME()) AS [what does zero mean?],
          DATEDIFF(DAY, '19000101', SYSDATETIME()) AS [it means 1900-01-01];

Running these will both return the same value for me: 43,775. In other words, 0 and 1900-01-01 are interchangeable.

If you’re playing along at home, stick negative numbers in there too.

It might be a different result from mine when you run it, but they’ll both be the same locally.

To prove this out a little more: If we add 43,775 days to 0 (1900-01-01), we get today’s date (2019-11-08).

SELECT DATEADD( DAY, 
                DATEDIFF(DAY, 0, SYSDATETIME()), 
			    0 ) 
			    AS [add 43775 days to 1900-01-01];

In this example, I’m saying that I want to add the results of the datediff function to the date 1900-01-01.

These functions are pretty powerful, because they can accept expressions, literals, variables, parameters, column values, and so on.

The important part about this one is that the time is zeroed out.

What’s The Point?


I’ll often see people need to “flatten” dates to certain intervals.

By flatten, I mean the start of a day, week, month, or year, and likewise for the end values to all those intervals.

I’ve seen some really bad implementations of this. Most notable was probably a scalar valued function that converted a datetime to a 10 character string to remove the time portion.

In every where clause.

For much better implementations, check out these posts:

Thanks for reading!

*I don’t really call it that don’t be ludicrous

Ask A Prospective DBA This One Question

Are You Hiring A DBA?


No, not because I’m looking. It’s just that a lot of companies, if they’re hiring their first DBA, or if they need a new one, don’t know how to start when screening candidates.

You can ask all sorts of “easy” questions.

  • What port does SQL Server use?
  • What does DMV mean?
  • What’s the difference between clustered and nonclustered indexes?

But none of those really make people think, and none of them really let you know if the candidate is listening to you.

They have autopilot answers, and you can judge how right or wrong they are pretty simply.

What’s MAXDOP Divided By Cost Threshold?


Tell them you’re setting up a brand new server, and you don’t wanna lose more than 5 minutes of data.

Ask them how they’d set up backups for that server.

If the shortest backup interval is more than five minutes apart, they’re likely not a good fit if:

  • It’s a senior position
  • They’ll be the only DBA
  • You expect them to be autonomous

This question has an autopilot answer, too. Everyone says they’ll set up log backups 15 minutes apart.

That doesn’t make sense when you can’t afford more than 5 minutes of data loss, because you can lose 3x that amount, or worse.

If Logging Is Without You


Aside from their answer, the questions they ask you when you tell them what you want are a good barometer of seniority.

  • What recovery model are the databases in?
  • How many databases are on the server?
  • How large are the databases and log files?

Questions like these let you know that they’ve had to set up some tricky backups in the past. But more than that, they let you know they’re listening to you.

Thanks for reading!