Extracting Only Letters Or Numbers From A String

All The Bad

There’s a lot of terrible advice out there about how to do this. It’s terrible because it involves the kinds of functions that really hurt performance: the scalar and multi-statement variety.

Worse, they’re usually full of while loops that build strings. These kinds of scalar functions aren’t currently eligible for inlining in 2019 either, so don’t think you’re going to get helped there, because they build strings.

SELECT with variable accumulation/aggregation (for example, SELECT @val += col1 FROM table1) is not supported for inlining.

Ain’t Perfect

I don’t think my solutions are perfect. Heck, doing this with T-SQL at all is a bad idea. You should be using CLR for this, but CLR has had so little support or betterment over the years, I don’t blame you for not embracing it. My dear friend Josh has taken the liberty of doing this part for you.

It would be nice if SQL Server had the kind of native support for writing in other languages that free databases do (especially since SQL Server supports Python, R, and Java now). But you know, we really needed uh… Well, just pick any dead-end feature that’s been added since 2005 or so.

My solutions use a numbers table. You’re free to try replacing that aspect of them with an inlined version like Jeff Moden uses in his string splitter, but I found the numbers table approach faster. Granted, it’s also less portable, but that’s a trade-off I’m willing to make.

What I don’t like about either solution is that I have to re-assemble the string using XML PATH. If you’ve got another way to do that, I’m all ears. I know 2017 has STRING_AGG, but that didn’t turn out much better, and it wouldn’t be usable in other supported versions.

Both scripts are hosted on my GitHub repo. I don’t want to set the example of using a blog post as version control.

Getting Numbers

If you need to isolate only the numbers from a string:

SELECT u.DisplayName, gn.*
FROM dbo.Users AS u
    CROSS APPLY dbo.get_numbers(u.DisplayName) AS gn
WHERE u.Reputation = 11;

Getting Letters

If you need to get just the letters from a string:

SELECT u.DisplayName, gl.*
FROM dbo.Users AS u
    CROSS APPLY dbo.get_letters(u.DisplayName) AS gl
WHERE u.Reputation = 11;

Complaint Department

If you’ve got ideas, bugs, or anything else, please let me know on GitHub. I realize that both scripts have holes in them, but you may find them good enough to get you where you’re going.

Thanks for reading!

4 thoughts on “Extracting Only Letters Or Numbers From A String

  1. Thanks Erik!

    FYI the binary collation won’t match accented or extended letters in the Latin character set. I tested with Latin1_General_CI_AI and it does return these.

Leave a Reply

Your email address will not be published. Required fields are marked *