The Purge SQL Agent Job for MDW takes a long time to complete

I use the dbWarden alerts to inform me if a SQL job is taking longer to complete than normal and I got one this morning:

Click to enlarge

I noticed by looking at the history this purge job was gradually taking longer and longer to complete each day since I installed it again (see mylast post on this):

RunDate	RunTime	Duration	ExecutionStatus	JobName
04/10/2013	02:00:01	00:00:45	Succeded	mdw_purge_data_[MDW]
05/10/2013	02:00:00	00:13:27	Succeded	mdw_purge_data_[MDW]
06/10/2013	02:00:00	00:17:03	Succeded	mdw_purge_data_[MDW]
07/10/2013	02:00:01	00:30:58	Succeded	mdw_purge_data_[MDW]
08/10/2013	02:00:01	01:04:25	Succeded	mdw_purge_data_[MDW]
09/10/2013	02:00:01	01:41:39	Succeded	mdw_purge_data_[MDW]
10/10/2013	02:00:01	02:35:17	Succeded	mdw_purge_data_[MDW]
11/10/2013	02:00:00	03:34:21	Succeded	mdw_purge_data_[MDW]
12/10/2013	02:00:01	04:12:51	Succeded	mdw_purge_data_[MDW]
13/10/2013	02:00:01	04:54:04	Succeded	mdw_purge_data_[MDW]
14/10/2013	02:00:00	05:38:58	Succeded	mdw_purge_data_[MDW]
15/10/2013	02:00:01	07:07:13	Succeded	mdw_purge_data_[MDW]
16/10/2013	02:00:00	08:21:11	Cancelled	mdw_purge_data_[MDW]

The last job I cancelled so I could figure out what was going on. A quick Google took me to this page which recommended adding a couple of indexes. Before I did so I took a look at the size of both the tables mentioned in the article:

sp_spaceused '[snapshots].[query_stats]'

name	rows	reserved	data	index_size	unused
query_stats	5,702,294	599072 KB	587616 KB	11320 KB	136 KB

sp_spaceused '[snapshots].[notable_query_text]'

name	rows	reserved	data	index_size	unused
notable_query_text	26,383	34872 KB	33616 KB	272 KB	984 KB

...And the execution plan of the following query before I made any changes:

SELECT qt.[sql_handle]

INTO #tmp_notable_query_text_test

FROM snapshots.notable_query_text AS qt

WHERE NOT EXISTS (

SELECT snapshot_id

FROM snapshots.query_stats AS qs

WHERE qs.[sql_handle] = qt.[sql_handle])

Execution Plan Before

Click to enlarge

You can clearly see that the scan of CIDX_query_stats costs by far the most of any operation in the query.

So I applied the following indexes as recommended in the article and looked at the execution plan of the query again:

USE [MDW]

SET ANSI_PADDING ON

CREATE NONCLUSTERED INDEX [IX_sql_handle]

ON [snapshots].[query_stats]

(

[sql_handle] ASC

)

CREATE NONCLUSTERED INDEX [IX_sql_handle]

ON [snapshots].[notable_query_text]

(

[sql_handle] ASC

)

Execution Plan After

Click to enlarge

Now various changes have happened to the query play including a Nested Loop operator replacing the Hash Match and an Index Seek replacing the Index Scan.

I ran the job manually and it completed much faster :

RunDate	RunTime	Duration	ExecutionStatus	JobName
16/10/2013	10:41:56	00:05:32	Succeded	mdw_purge_data_[MDW]

But much of the hard work would have already been done from the previous night’s schedule, so I will need to wait until tonight to see if the time had be reduced.

Update

It looks like the queries have done their job. The duration has greatly improved:

RunDate	RunTime	Duration	ExecutionStatus	JobName
17/10/2013	02:00:00	00:21:00	Succeded	mdw_purge_data_[MDW]

A word of warning

Even though the MDW has been active for only two weeks and I am monitoring only one medium-busy server, nearly 14GB of data has been generated:

Click to enlarge

Comments

Stefano Gioia26 November 2013 at 15:32
Very useful. I have many customers that simply LOVE the Data Collector interface. Unfortunately, DC is full of problems (random errors in collection jobs, database growing too much, report errors).
ReplyDelete
Replies

Add comment

SQL Server Snippets | Paul Hewson

Search This Blog