This post has been republished via RSS; it originally appeared at: Azure Database Support Blog articles.
Some time ago, we encountered a support case where a customer experienced significant delays in updating auto-created and user-created statistics. I would like to share the insights gained from this experience, including the underlying causes of the issue and the potential solutions we identified and implemented to address the problem effectively.
The Scenario
Consider the following table:
We added rows to this table and rebuilt the clustered index:
To obtain the total number of rows filtered by the Engineering department, we executed the following T-SQL:
With auto-create statistics enabled in our database, SQL Server created an auto-created statistic using the Department column for this table.
Following this, we found that the performance was not optimal, and upon reviewing the execution plan, we discovered that creating an index on the Department column would improve performance by approximately 82%. Consequently, the index was created:
At this point, we observed the following:
- Two statistics were present: _WA_Sys_00000004_09C5A64A and Index_Departament.
- The execution plan would utilize and update (if needed, with auto-update statistics enabled) the Index_Departament statistics.
- The _WA_Sys_00000004_09C5A64A statistic would never be used.
- However, when our customer used sp_updatestats or another tool that required updating all auto/user-created statistics, this statistic would be recalculated, which could be time-consuming depending on the number of rows.
To address this, we advised our customer to check for potentially duplicated statistics and to review each one to determine whether it could be deleted. As an example, we provided the following T-SQL to identify all statistics for this table where a column is involved in any statistics, allowing for later review and decision-making on whether to retain or delete them.
It's important to note that manual review is often the best approach to deciding on the maintenance of statistics, as a column may be used in multiple definitions. Additionally, it's crucial to test any changes in a test environment before implementation to avoid potential performance issues:
Here's an example of the output
Enjoy!