Troubleshooting Security Cache Issues: USERSTORE_TOKENPERM And TokenAndPermUserStore

TokenAndPermUserStore


I’ve been seeing this issue on newer versions of SQL Server, from 2016 to 2019, particularly where some form of Auditing is running. What happens next is either the plan cache totally clearing out, and/or queries taking a long time to compile.

I’m still digging in more on the “why” here, but I have a decent amount of confidence in what I’ve found so far. This has occurred on several servers at different client sites, and doesn’t appear to be isolated.

Anyway, using this query to hit the RING_BUFFER_SECURITY_CACHE was helpful in figuring out when things were happening.

You can see a limited historical view of cache sizes and the number of entries in those caches.

SELECT 
    event_time = 
        DATEADD
        (
            MILLISECOND, 
            x.xml_record.value('(//Record/@time)[1]', 'BIGINT') - osi.ms_ticks, 
            GETDATE()
        ),
    tokenandperm_userstore_gb = 
        x.xml_record.value('(//Record/TokenAndPermUserStore/@pages_kb)[1]', 'BIGINT') / 1024. / 1024.,
    acr_cachestores_gb = 
        x.xml_record.value('(//Record/ACRCacheStores/@pages_kb)[1]', 'BIGINT') / 1024. / 1024.,	
    xml_record
FROM 
(
    SELECT TOP (5000)
    	timestamp,
        xml_record = 
    	    CONVERT
            (
                xml, 
                record
            )
    FROM sys.dm_os_ring_buffers AS dorb
    WHERE dorb.ring_buffer_type = 'RING_BUFFER_SECURITY_CACHE'
    ORDER BY timestamp DESC
) AS x
CROSS JOIN sys.dm_os_sys_info osi
ORDER BY x.timestamp DESC
OPTION(RECOMPILE);

Memory Clerks


Likewise, this much shorter query can be used to see how large your security cache currently is.

The ring buffers are written to at intervals, and may not be up to date.

SELECT 
    TokenAndPermUserStore = 
        CONVERT(DECIMAL(38, 2), (pages_kb / 1024. / 1024.))
FROM sys.dm_os_memory_clerks
WHERE type = 'USERSTORE_TOKENPERM'
AND   name = 'TokenAndPermUserStore'

This is a more current view of things, and can be used to trend cache sizes. It will only work on SQL Server 2012 and up.

If you’re on a version of SQL Server prior to that, that’s your problem.

“Tipping Points”


While I don’t have any precise guidance on when the size or entries will cause an issue, it seems that the cache needs to hit several GB or more.

If it regularly does that and you experience performance issues when it does, then you might try running this to clear out the security cache:

DBCC FREESYSTEMCACHE('TokenAndPermUserStore');

I’ve tried using Trace Flags 4610 and 4618, and found 4610 made things worse, and 4618 made things exactly the same.

YMMV.

Because I’ve found a job to fix things more reliable, I’ve published one on GitHub. You can stick it in an agent job. You can run it on whatever schedule you want, at whatever cache size threshold you want, and it will log its activity to a table.

Reproductive


My friend Josh (b|t) hooked me up with a way to repro a growing security cache, if you want to test things out.

IF NOT EXISTS
(
    SELECT
        1/0
    FROM sys.database_principals AS dp
    WHERE dp.type = 'A'
    AND   dp.name = 'your_terrible_app'
)
BEGIN
    CREATE APPLICATION ROLE your_terrible_app
        WITH DEFAULT_SCHEMA = [dbo], 
        PASSWORD = N'y0ur_t3rr1bl3_4pp';
END

DECLARE 
    @counter INT = 0;

DECLARE 
    @holder table
    ( 
        id INT PRIMARY KEY IDENTITY,
        cache_size DECIMAL(10,2),
        run_date DATETIME 
     );

WHILE 
    @counter <= 20000
    BEGIN
    SET NOCOUNT ON;
        
        DECLARE @cronut VARBINARY(8000);
        DECLARE @bl0b_eater SQL_VARIANT;

        EXEC sys.sp_setapprole 
            @rolename = 'your_terrible_app',
            @password = 'y0ur_t3rr1bl3_4pp',
            @fCreateCookie = true,
            @cookie = @cronut OUTPUT;

        SELECT 
            @bl0b_eater = USER_NAME();

        EXEC sys.sp_unsetapprole 
            @cronut;

        SELECT 
            @bl0b_eater = USER_NAME();

        IF @counter % 1000 = 0
        BEGIN
            INSERT 
                @holder(cache_size, run_date)
            SELECT 
                cache_size = 
                    CONVERT
                    (
                        decimal(38, 2), 
                        (domc.pages_kb / 1024. / 1024.)
                    ),
                run_date = 
                    GETDATE()
            FROM sys.dm_os_memory_clerks AS domc
            WHERE domc.type = 'USERSTORE_TOKENPERM'
            AND   domc.name = 'TokenAndPermUserStore';
            RAISERROR('%i', 0, 1, @counter) WITH NOWAIT;
        END;

        SELECT 
            @counter += 1;
    END;

    DBCC FREESYSTEMCACHE('TokenAndPermUserStore');

    SELECT 
        h.* 
    FROM @holder AS h 
    ORDER BY h.id;

    DBCC TRACESTATUS;
    
    DBCC TRACEON(4610, -1);
    DBCC TRACEOFF(4610, -1);

    DBCC TRACEON(4618, -1);
    DBCC TRACEOFF(4618, -1);

    DBCC TRACEON(4610, 4618, -1);
    DBCC TRACEOFF(4610, 4618, -1);

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.



7 thoughts on “Troubleshooting Security Cache Issues: USERSTORE_TOKENPERM And TokenAndPermUserStore

  1. I just want to be sure of something about this, as it sounds like this could be a problem we run into intermittently. When you say “where some form of Auditing is running” you’re talking about a Database or Server Audit? I’ve not gone in yet to check the cache yet, but if it looks like this might be the problem, the job will be the next step.

    1. I’ve only ever seen it with server level audits, but I don’t see database level audits all that often. I’d guess it’s possible with them as well.

  2. We have an issue where we ran DBCC FREESYSTEMCACHE(‘TokenAndPermUserStore’) every minute with a check if the count was over 10000 run it – has run fine for couple of years. Yesterday – instead of 2 seconds it took 8 minutes to run, then 16 minutes, then just kept doubling and the system was slow…. Rebooted box – just ran the count and it is over 853000 so trying to run it and it is again taking forever. We are on sqlserver 2017 and just had applied updates back in December. Any thoughts?

    1. Ooh that’s a tough one without a lot more details. If you’re not able to open a Microsoft support case, I’d ask a question over at dba.se with stuff like current SQL Server and Windows patch levels, any settings changes or trace flags you have enabled, etc. A few folks who work at Microsoft answer questions there, and they might be able to chime in.

Comments are closed.