A couple of days ago, I was informed about performance degradation which periodically happened in our production system of MS SQL Server 2012 instance running under Windows Server 2012R2. QUERYSTORE would be good solution for monitoring performance for the situation, however it comes in SQL Server 2014 version. Because there is not QUERYSTORE technology in SQL Server 2012, lets create manual sampling by using BRENTOZAR script sp_BlitzFirst. I have created JOB which runs every 15 minute and calls the sp_BlitzFirst. Please see below:
USE [msdb] GO DECLARE @jobId BINARY(16) EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'DBAtools_Sampling', @enabled=0, @notify_level_eventlog=0, @notify_level_email=0, @notify_level_netsend=0, @notify_level_page=0, @delete_level=0, @description=N'No description available.', @category_name=N'[Uncategorized (Local)]', @owner_login_name=N'SA', @job_id = @jobId OUTPUT IF (@@ERROR <> 0 OR @ReturnCode<> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Step1', @step_id=1, @cmdexec_success_code=0, @on_success_action=1, @on_success_step_id=0, @on_fail_action=2, @on_fail_step_id=0, @retry_attempts=0, @retry_interval=0, @os_run_priority=0, @subsystem=N'TSQL', @command=N'EXEC sp_BlitzFirst @Seconds = 30, @OutputDatabaseName = ''DBAtools'', @OutputSchemaName = ''dbo'', @OutputTableName = ''BlitzFirst'', @OutputTableNameFileStats = ''BlitzFirst_FileStats'', @OutputTableNamePerfmonStats = ''BlitzFirst_PerfmonStats'', @OutputTableNameWaitStats = ''BlitzFirst_WaitStats'', @OutputTableNameBlitzCache = ''BlitzCache''', @database_name=N'DBAtools', @flags=0 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Schedule1', @enabled=1, @freq_type=4, @freq_interval=1, @freq_subday_type=4, @freq_subday_interval=15, @freq_relative_interval=0, @freq_recurrence_factor=0, @active_start_date=20190620, @active_end_date=99991231, @active_start_time=0, @active_end_time=235959, @schedule_uid=N'1d27b7fb-ae33-4334-94f2-7b9101692760' IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)' IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback COMMIT TRANSACTION GOTO EndSave QuitWithRollback: IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION EndSave: GO
About sp_BlitzFirst you can read from BRENTOZR blog. It creates a couple of tables and views in the given database for periodically sampling (30 seconds in our situation) from SQL Server internal performance views like dm_os_wait_stats, dm_io_virtual_file_stats and etc.
After sampling we can select the blitzfirst_waitstats table and see which are the most occurred waits.
select [wait_type], sum([wait_time_ms]) / 1000 / 3600 wt from [dbatools].[dbo].[blitzfirst_waitstats] group by [wait_type] order by wt desc wait_type wt TRANSACTION_MUTEX 30024 BACKUPIO 2504 TRACEWRITE 1792 ASYNC_NETWORK_IO 1179 SOS_SCHEDULER_YIELD 906 CXPACKET 753 LCK_M_IS 696 LCK_M_SCH_M 630 WRITELOG 513 LCK_M_SCH_S 452 LCK_M_IX 319 ...........
As we see the TRANSACTION_MUTEX is the top wait. The Books Online definition of TRANSACTION_MUTEX
is that it “occurs during synchronization of access to a transaction by multiple batches. Lets go deeper and catch the query which generates TRANSACTION_MUTEX wait. For doing that I am going to use Extended Events technology.
First of all I got the code of TRANSACTION_MUTEX wait
SELECT [map_key] FROM sys.dm_xe_map_values WHERE [name] = N'wait_types' AND [map_value] = N'TRANSACTION_MUTEX'; map_key --------- 210
Then created Extended Event for catching Query which generate TRANSACTION_MUTEX wait.
CREATE EVENT SESSION [InvestigateWaits] ON SERVER ADD EVENT sqlos.wait_info( ACTION(sqlserver.database_id,sqlserver.plan_handle,sqlserver.session_id,sqlserver.sql_text,sqlserver.tsql_stack,sqlserver.username) WHERE ([wait_type]=(210) AND [opcode]=(1))) ADD TARGET package0.ring_buffer WITH (MAX_MEMORY=51200 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=5 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF) GO
Activate created Extended Event and open Watch Live Data page.
We should see window like below.
After monitoring some period of time, At last I have got that query. The query and its plan shown below.
SELECT [GroupBy1].[A1] AS [C1] FROM ( SELECT COUNT(1) AS [A1] FROM [dbo].[VW_DOCS] AS [Extent1] LEFT OUTER JOIN [dbo].[DOCS_READSTATUS] AS [Extent2] ON ([Extent1].[DocId] = [Extent2].[DocId]) AND ([Extent2].[WorkplaceId] = 24) WHERE ([Extent1].[DocDoctypeId] = 1) AND ([Extent1].[DocPeriodId] = 3) AND (([Extent1].[DocOrganizationId] = 24) OR ( EXISTS (SELECT 1 AS [C1] FROM [dbo].[DOCS_EXECUTOR] AS [Extent3] INNER JOIN [dbo].[DOCS_DIRECTIONS] AS [Extent4] ON [Extent3].[ExecutorDirectionId] = [Extent4].[DirectionId] WHERE ([Extent1].[DocId] = [Extent3].[ExecutorDocId]) AND ((([Extent3].[ExecutorWorkplaceId] = 24) AND (1 = [Extent4].[DirectionConfirmed])) OR (([Extent3].[ExecutorOrganizationId] = 24) AND (1 = [Extent4].[DirectionConfirmed]))) ))) AND (( NOT ((2 = CAST( [Extent1].[DocDeleted] AS int)) AND ([Extent1].[DocDeleted] IS NOT NULL))) OR ((2 = CAST( [Extent1].[DocDeleted] AS int)) AND ([Extent1].[DocInsertedById] = 24))) AND (1 = 1) ) AS [GroupBy1]
It is one of the most usable query and it takes about 3 seconds to complete. From the query plan we see expensive Nested Loop join which will generate concurrency in case of many query like this will be running. However, If we change join method to Hash Join by using SQL Server hint OPTION(HASH JOIN), we will decrease concurrency and query run time (less then one second). Corrected version of query and its plan shown below.
SELECT [GroupBy1].[A1] AS [C1] FROM ( SELECT COUNT(1) AS [A1] FROM [dbo].[VW_DOCS] AS [Extent1] LEFT OUTER JOIN [dbo].[DOCS_READSTATUS] AS [Extent2] ON ([Extent1].[DocId] = [Extent2].[DocId]) AND ([Extent2].[WorkplaceId] = 24) WHERE ([Extent1].[DocDoctypeId] = 1) AND ([Extent1].[DocPeriodId] = 3) AND (([Extent1].[DocOrganizationId] = 24) OR ( EXISTS (SELECT 1 AS [C1] FROM [dbo].[DOCS_EXECUTOR] AS [Extent3] INNER JOIN [dbo].[DOCS_DIRECTIONS] AS [Extent4] ON [Extent3].[ExecutorDirectionId] = [Extent4].[DirectionId] WHERE ([Extent1].[DocId] = [Extent3].[ExecutorDocId]) AND ((([Extent3].[ExecutorWorkplaceId] = 24) AND (1 = [Extent4].[DirectionConfirmed])) OR (([Extent3].[ExecutorOrganizationId] = 24) AND (1 = [Extent4].[DirectionConfirmed]))) ))) AND (( NOT ((2 = CAST( [Extent1].[DocDeleted] AS int)) AND ([Extent1].[DocDeleted] IS NOT NULL))) OR ((2 = CAST( [Extent1].[DocDeleted] AS int)) AND ([Extent1].[DocInsertedById] = 24))) AND (1 = 1) ) AS [GroupBy1] OPTION (HASH JOIN)
Problem has been fixed. Thank you for attention 🙂