Troubleshooting SQL Server Scheduling and Yielding

This post has been republished via RSS; it originally appeared at: SQL Server Support articles.

First published on MSDN on Apr 05, 2018
Writer: Simon Su
Technical Reviewer: Pam Lahoud , Sourabh Agarwal, Tejas Shah


Scheduling and Y ielding Knowledge R ecap

We all know that SQL server is a multi-threads and multi-tasks system and it has its own thread scheduling mechanism which is a small part of job of what we call SQLOS. If you are not familiar with SQLOS you can refer to below two links for the details:

A new platform layer in SQL Server 2005 to exploit new hardware capabilities and their trends
https://blogs.msdn.microsoft.com/slavao/2005/07/20/platform-layer-for-sql-server

Inside the SQL Server 2000 User Mode Scheduler
https://msdn.microsoft.com/library/aa175393.aspx

How To Diagnose and Correct Errors 17883, 17884, 17887, and 17888
https://technet.microsoft.com/en-us/library/cc917684.aspx



Inside SQL server source code, there are many voluntary yield points to make multi-threads run efficiently and cooperatively. If a SQL Server worker thread does not voluntarily yield, it will likely prevent other threads from running on the same scheduler. When the owner of the scheduler has not yielded within 60 seconds and as a result pending requests (tasks) are stalled, SQL Server will log “non-yielding scheduler error in error log like below:



2018-03-10 21:16:35.89 Server      ***********************************************
2018-03-10 21:16:35.89 Server      *
2018-03-10 21:16:35.89 Server      * BEGIN STACK DUMP:
2018-03-10 21:16:35.89 Server      *   03/10/18 21:16:35 spid 22548
2018-03-10 21:16:35.89 Server      *
2018-03-10 21:16:35.89 Server      * Non-yielding Scheduler
2018-03-10 21:16:35.89 Server      *
2018-03-10 21:16:35.89 Server      ***********************************************



And a mini dump is generated in the LOG folder. You can contact Microsoft support to understand the details of the mini dump and check whether this non-yielding scheduler warning is a serious issue or it can be safely ignored.



How long a thread has been running on scheduler without yield ing

Regarding SQL server scheduling and yielding all things sound perfect until I got deeply involved in a customer’s case to troubleshoot a sudden transaction drop issue. I will write another post to share the story of that. In that case I need to find out how long a SQL server worker thread has been running on a scheduler without yielding. We know if non-yielding condition exceeds 60 seconds SQL server will log non-yielding error accordingly. However how about those threads that have been running on the schedulers without yielding but less than 60 seconds? Do we have a way to get the detail information of these non-yielding threads? Note that you cannot use the CPU time column of a statement in profiler trace because the CPU time of a query doesn’t mean it has been exclusively running on the CPU without yielding for that long. Some yields could have occurred among the life cycle of a query execution and SQL server records these yields with “SOS_SCHEDULER_YIELD” wait type.



SQL server worker has a quantum target of 4ms (milliseconds), however due to the complexity of the reality there could be places that the code runs unexpectedly long before it reaches to the voluntary yield point. Normally this is not a problem because that thread will eventually yield without starving the runnable list threads. In case we really need to know the details of this kind of things (like how long a thread has been running on the CPU without yielding) we can use below approaches.

Before talking about the method to troubleshoot this kind of scheduling and yielding issue let’s see how a scheduler and its tasks look like:



Here is the logic how scheduling works:

  1. When the task is running, it is in “Running” state and it is the active worker of the scheduler.

  2. When the task waits for nothing but CPU to run it is put into “Runnable” queue of a scheduler

  3. When the task is waiting for something (like lock,disk I/O, etc) it is in “Suspended” state.

  4. If the suspended task finishes the waits (wait for nothing) and is ready to run, it is put to the end of the Runnable queue.

  5. If the running thread voluntarily yields it is put back to end of Runnable queue.

  6. If the running thread needs to wait for something it is switched out of the scheduler and is put into suspended status.

  7. If the running thread finishes its work then the top thread of the runnable queue becomes the “Running” thread.




Resource Wait Time

For a suspended task if it is waiting for something we have lots of way to get the wait related information like wait_time, wait_resource etc. For example, both DMVs of sys.dm_os_waiting_tasks or sys.dm_exec_requests can tell the detail wait statistics:



SELECT session_id,status,wait_time,wait_type,wait_resource,last_wait_type
FROM sys.dm_exec_requests
WHERE session_id=52

Result:





Signal W ait T ime

If you query “sys.dm_os_wait_stats” you will find a column called “signal_wait_time_ms”. Signal wait time is the time a thread spent on the scheduler’s “runnable” list waiting to get on the CPU and run again. sys.dm_os_wait_stats output gives you an overall picture of waits for each wait type including the  signal wait time. If you want to get the detailed information of signal wait time of each individual session you can leverage Xevent wait_info and wait_info_external. There is an excellent article discussing how to use wait_info event to trace REDO latency:

https://blogs.msdn.microsoft.com/alwaysonpro/2015/01/06/troubleshooting-redo-queue-build-up-data-latency-issues-on-alwayson-readable-secondary-replicas-using-the-wait_info-extended-event/

The same approach applies to all other waits. I use below steps to simulate signal waits:

1. Create a table in tempdb

USE tempdb
CREATE TABLE t1 (c1 int)

2. Alter SQL server to use only one CPU (never do this on production server!):

EXEC SP_CONFIGURE  'affinity mask',2 --use only the second CPU of the system
RECONFIGURE WITH OVERRIDE

3. Now start the xevent trace:

IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name LIKE 'SignalWaitDemo')
DROP EVENT SESSION [SignalWaitDemo] ON SERVER
GO
CREATE EVENT SESSION [SignalWaitDemo] ON SERVER
ADD EVENT sqlos.wait_info(
ACTION(sqlos.scheduler_id,sqlserver.database_id,sqlserver.session_id)
--Capture End event (opcode=1) only
WHERE ([package0].[equal_uint64]([opcode],(1))
--Only capture user sessions (session_id>=50)
AND [package0].[greater_than_equal_uint64]([sqlserver].[session_id],(50))
--You can change duration to bigger value, say, change below 10 ms to 3000ms
AND [duration]>=(10)))
ADD TARGET package0.event_file(SET filename=N'E:\temp\Wait_Info_Demo.xel')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30  SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)

GO

ALTER EVENT SESSION [SignalWaitDemo] ON SERVER STATE=START;

4. Then use ostress.exe tool to simulate workload to SQL server:

ostress -n1000 -r200 -S. –isignal_wait.sql

In signal_wait.sql, I have below query inside:

SET NOCOUNT ON
USE tempdb
DECLARE @I int=0,@k int=0


BEGIN
IF(rand()>0.9)update t1 set c1=c1+10*(rand()-0.5)
DECLARE @document varchar(64);
SELECT @document = 'Reflectors are vital safety' +
' components of your bicycle.';
DECLARE @j int
SET @j=CHARINDEX('bicycle', @document);
SET @j=CHARINDEX('are', @document);
SET @j=CHARINDEX('vital', @document);
SET @j=CHARINDEX('dd', @document);
SET @j=CHARINDEX('your', @document);
END

When the ostress tool is running, I query “select * from sys.dm_exec_requests where session_id>50” and I got below output:



You notice there are lots of runnable threads as well as suspended threads. Those suspended threads are the threads that are waiting for UPDATE lock while those runnable ones are waiting for scheduler to run.

I stop the Xevent SignalWaitDemo trace and I got below result:



You see that there are very long signal_duration sessions in the result which means they are in runnable queue for that long time.



Non-yielding Scheduler Time

From above description, we know how to check the resource waits time and signal waits time. Now comes to the million dollars question, how to know how long a thread has been running on a given scheduler without yielding (I call it Non-yielding Scheduler Time)?

Note that non-yielding scheduler time means how long a thread is occupying the scheduler without any yielding. It doesn’t always means the CPU execution time.  The thread holding the scheduler may be held by operation system if other application is using the same CPU at that time point. This is not common since in most cases it is a dedicated server for SQL server and no other heavy application running on the machine.

The fact is that we do not have a handy way to track how long a thread is actually continuously running on a scheduler without yielding. I would expect there is a max_non_yielding_scheduler_time_ms column somewhere in DMV but there is no at this moment.



The good news is that we have yield_count in DMV of “sys.dm_os_schedulers” like below:

select yield_count, runnable_tasks_count, * from sys.dm_os_schedulers where scheduler_id<1024



If the scheduler yields the yield_count will be increased by one. We can query this DMV regularly and get the delta of this column. If the yield_count doesn’t change during the monitoring time delta we know that someone is running on the scheduler for that period of time.

For example:

At timestamp1, yield_count is 33555

After 5 seconds, we query it again, and if the yield_count is still the same, 33555, then we know that a thread has holding the scheduler for at least 5 seconds.

Once we get the non-yielding scheduler, we can use scheduler’s active worker to inner join sys.dm_os_workers to get the active task, and use the active task to join the sys.dm_exec_requests to get the related user session information. A scheduler’s active worker is the worker thread that is current running on the scheduler, which is normally the running session running on the scheduler.

Here is the script to save the yield_count and other related information to a permanent table called “yields”. The script will run with every specified interval until you manually stop it:

USE <yourdb>
CREATE TABLE yields
(runtime datetime, scheduler_id bigint,yield_count bigint,runnable int, session_id int,start_time datetime,command varchar(200),database_id int)

GO

SET NOCOUNT ON
WHILE(1=1)
BEGIN
INSERT INTO yields
SELECT getdate() 'runtime', a.scheduler_id, a.yield_count, runnable_tasks_count, session_id,start_time, command,database_id
FROM sys.dm_os_schedulers a
inner join sys.dm_os_workers b on a.active_worker_address=b.worker_address
left join sys.dm_exec_requests c on c.task_address=b.task_address
--Most system has less than 1024 cores, use this to ignore those HIDDEN schedulers
WHERE a.scheduler_id<1024
--Monitor it every 5 seconds. you can change it to meet your needs
WAITFOR DELAY '00:00:05'
END

To get interesting non-yielding scheduler information out of table yields, I use below script. It is not the perfect one but it can give you the idea how to get  meaningful information from the captured data.

DECLARE scheduler_cur CURSOR
FOR SELECT scheduler_id from yields group by scheduler_id order by scheduler_id
OPEN scheduler_cur
DECLARE @id bigint
FETCH NEXT  FROM scheduler_cur INTO @id
WHILE (@@FETCH_STATUS=0)
BEGIN
DECLARE delta_cur CURSOR
FOR SELECT runtime, yield_count,scheduler_id,runnable,session_id,start_time, command,database_id
FROM yields WHERE scheduler_id=1  ORDER BY runtime ASC
OPEN delta_cur
DECLARE @runtime_previous datetime,@yieldcount_previous bigint
DECLARE @runtime datetime,@yieldcount bigint,@scheduler_id bigint,@runnable int,@session_id int,@start_time datetime,@command varchar(200),@database_id int


FETCH NEXT FROM delta_cur INTO  @runtime ,@yieldcount ,@scheduler_id,@runnable ,@session_id ,@start_time,@command,@database_id
SET @runtime_previous=@runtime;SET @yieldcount_previous=@yieldcount
FETCH NEXT FROM delta_cur INTO  @runtime ,@yieldcount ,@scheduler_id ,@runnable,@session_id ,@start_time,@command ,@database_id

WHILE(@@FETCH_STATUS=0)
BEGIN
--We find one non-yielding scheduler during the runtime delta
IF(@yieldcount=@yieldcount_previous)
BEGIN
PRINT 'Non-yielding Scheduler Time delta found!'
SELECT @runtime_previous 'runtime_previous', @runtime 'runtime', datediff(second, @runtime_previous,@runtime) 'non_yielding_scheduler_time_second', @yieldcount_previous 'yieldcount_previous',
@yieldcount 'yieldcount' ,@scheduler_id 'scheduler_id',@runnable 'runnable_tasks' ,@session_id 'session_id' ,@start_time 'start_time',

@command 'command' ,@database_id  'database_id'
END

-- print @id
SET @runtime_previous=@runtime;SET @yieldcount_previous=@yieldcount
FETCH NEXT FROM delta_cur INTO  @runtime ,@yieldcount ,@scheduler_id,@runnable ,@session_id ,@start_time,@command ,@database_id

END

CLOSE delta_cur
DEALLOCATE delta_cur
FETCH NEXT  FROM scheduler_cur INTO @id

END
CLOSE scheduler_cur
DEALLOCATE scheduler_cur

The output looks like below:



From above output you can see that scheduler 1 has non_yielding_scheduler_time for several times. Actually that scheduler 1 is hang because I suspended its active worker thread in debugger.



If you want to capture more information about the user session, like application name, hostname, etc you can run profiler trace or xevent at the same time to capture those events, and then you can correlate information with yields table to drill down further.



Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.