Lesson Learned #266: Avoiding duplicate statistics

Posted by

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Today, I worked on a service request that our customer reported that the statistics update maintenance plan is taking too much time. This is causing that all these operations (rebuild and update statistics) are taking more time that the maintenace windows that they have. I would like to share with you my findings how to reduce this time.  

 

I found we're updating too many statistics and we worked in two main operations:

 

  • Avoiding to update statistics of are related to indexes: 
    • I saw that our customer is rebuilding the indexes and after it updating all statistics. If you already rebuilt the indexes remember that the statistics associated with this index will be updated too. So, there is not needed to invest resources in terms of memory, CPU and IO to perform this operation. 
  • Duplicate Statistics:
    • I saw a high number of statistics in their table, around 250 statistics and also, they have a high number of indexes in the table. I asked if they created these indexes based on performance needs or they executed previously a performance test with their application to find out these indexes. They mentioned that they are creating indexes depending on performance basis. 
    • In this case, I suggested to review if they have already an user/automatic statistics created before creating any index, Because they could have duplicate statistics one user/automatic statistics and another one index related. For example, 

 

We have the following table:

 

CREATE TABLE MyData (ID INT IDENTITY(1,1) PRIMARY KEY, [Name] VARCHAR(80)) INSERT INTO myData ([name]) valueS(replicate('Y',70)); GO 1000 INSERT INTO myData ([name]) valueS(replicate('M',70)); GO 1000 INSERT INTO myData ([name]) valueS(replicate('C',70)); GO 1000 INSERT INTO myData ([name]) valueS(replicate('A',70)); GO 1000

 

if our application runs any query using the column name, for example, SELECT * FROM MyData where [name]='YMCA', we are going to see that an automatic statistics has been created for this column called _WA_Sys_00000002_2116E6DF

 

Jose_Manuel_Jurado_0-1672832761148.png

 

Jose_Manuel_Jurado_1-1672832802993.png

 

Futher, if based on performance needs you need to create an index in this table based on this column, you are going to have the following situation:

 

CREATE INDEX MyData_ix1 ON MyData ( Name )

 

2 statistics: One related with user/automatic statistic and other one related with index. 

 

Jose_Manuel_Jurado_2-1672833003335.png

 

So, dropping the unnecessary automatic statistics, our customer reduces time for updating the statistics. In this case, my lesson learned here, is check if you have an existing user/automatic statistics to avoid this scenario and check your script to avoid this situation.

 

Enjoy!

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.