Site icon

Lesson Learned #414: Demystifying Execution Plans and Query Optimization in Azure SQL Database

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

In the world of database management, achieving optimal query performance is a crucial goal.As data volumes grow and applications become more complex, understanding how queries are executed and optimized becomes paramount. This is where execution plans come into play. In this article, I would like to give an example, why a query might have a new execution plan. Of course, we have many factors that influence them but I would like to explain a real scenario working with a customer and why the execution plan changed. 


What are Execution Plans?


An execution plan is a strategic roadmap that the database engine constructs to execute a SQL query efficiently. Think of it as a blueprint that guides the database engine in choosing the best path to retrieve and manipulate data. The execution plan outlines the sequence of operations, such as scanning tables, filtering rows, and joining data, necessary to fulfill the query.


Factors Influencing Execution Plans


Several factors influence the generation of execution plans, directly impacting query performance:

1. Statistics

Statistics play a crucial role in query optimization. They provide insights into the distribution of data within tables, helping the query optimizer make informed decisions about the most efficient way to access and manipulate data.

2. Indexes

Indexes significantly affect execution plans. They provide shortcuts for retrieving data by pre-sorting and organizing table data. The query optimizer considers index availability and selects plans that utilize them effectively.

3. Data Distribution

The distribution of data across tables affects execution plans. Uneven data distribution can lead to suboptimal plan choices, resulting in slow query performance.

4. Query Complexity

The complexity of a query influences the optimizer's plan selection. Complex queries may have multiple execution plan possibilities, and the optimizer must evaluate trade-offs to determine the most efficient approach.

5. Database Design

A well-designed database schema can promote efficient execution plans. Properly normalized tables and relationships contribute to better query performance.


Query Optimization Process


The query optimization process in Azure SQL Database involves a series of steps aimed at finding the optimal execution plan:

  1. Parsing: The query is parsed to identify its structure and semantics.

  2. Algebraic Simplification: The query is transformed using algebraic rules to simplify its structure without altering the result.

  3. Logical Optimization: Different logical plans are generated, representing various ways to execute the query.

  4. Physical Optimization: The logical plans are transformed into physical execution plans that outline the specific steps required for data retrieval.

  5. Cost-Based Optimization: The optimizer estimates the cost of each physical plan based on factors like I/O, CPU usage, and memory usage. The plan with the lowest estimated cost is chosen.


Capturing and Analyzing Execution Plans

To analyze execution plans, you can use these techniques:


Using the SET STATISTICS XML ON command before executing a query captures the execution plan XML. This XML provides a detailed breakdown of the plan's components, operators, and estimated costs.

2. Query Store

Azure SQL Database offers the Query Store feature, which automatically captures execution plans for queries. It provides historical data on plan changes, query performance, and plan choice reasons.


Analyzing Execution Plans


Analyzing execution plans involves understanding their components:

  1. Operators: Operators represent specific actions performed on data, such as scans, joins, and filters.

  2. Iterators: Iterators implement operators and execute specific actions on data, following the plan's instructions.

  3. Estimates: Estimates provide information about expected row counts, helping to evaluate plan efficiency.


Optimizing Query Performance

To optimize query performance using execution plans:

1. Index Optimization

Create and maintain appropriate indexes based on query patterns to enhance data retrieval efficiency.

2. Statistics Maintenance

Regularly update statistics to ensure the query optimizer has accurate information for plan selection.

3. Query Rewriting

Rewrite complex queries to simplify their structure and reduce plan complexity.

4. Query Hints

Use query hints to guide the optimizer's decision-making process and enforce specific execution plans when necessary.


Playing with an example:


In this scenario, we have the following table definition:




------------------------------------------------- -- Different Execution plans due to: ----- auto-created statistics by SQL. ----- Force Plan ----- An Index ------------------------------------------------- DROP TABLE IF EXISTS [dbo].[Notes] CREATE TABLE [dbo].[Notes]( [ID] [int] NULL, [NAME] [varchar](200) NULL, [id2] [int] IDENTITY(1,1) NOT NULL, CONSTRAINT [PK_Notes] PRIMARY KEY CLUSTERED ([id2] ASC)) ------------------------------------------------- -- Create the store procedure -------------------------------------------------- CREATE OR ALTER PROCEDURE dbo.GiveNotes @N AS INT = null AS SELECT count(Name),name FROM Notes where ID<@n group by Name -------------------------------------------------- -- Insert data -------------------------------------------------- INSERT INTO Notes (ID,Name) SELECT RAND()*(100000 - 1) + 1, 'Info:'+convert(varchar(200),RAND()*(100000 - 1) + 1) INSERT INTO Notes (ID,Name) SELECT RAND()*(100000 - 1) + 1, 'Info:'+convert(varchar(200),RAND()*(100000 - 1) + 1) FROM Notes



Also, we have the following Python code:



import os #os.environ['TDSDUMP'] = 'stdout' import pyodbc import time import random def ConnectToTheDB(timeout): try: print('Connecting to the DB') start_time = time.time() conn = pyodbc.connect("DRIVER={ODBC Driver 17 for SQL Server};,1433;UID=XXX;PWD=XXX;database=dbname;APP=Test-Python") conn.timeout=timeout print("Connected to the Database %s seconds ---" % ((time.time() - start_time)) ) return conn except BaseException as e: print("An error occurred connecting to the DB - " + format(e)) return def RunDifferentExecutionPlan(): try: conn = ConnectToTheDB(0) cursor = conn.cursor() nLoop=1 while nLoop<100000: nLoop=nLoop+1 numero = random.randint(1, 600000) print('Run Different Execution Plan starting up....' + str(numero)) start_time = time.time() cursor.execute("EXEC dbo.GiveNotes " + str(numero)) print("- (Value1) Execution Time: %s seconds ---" % ((time.time() - start_time)) ) conn.close() except BaseException as e: print("An error executing the command - " + format(e)) finally: print('Run Different Execution Plan finished....') RunDifferentExecutionPlan()




Starting the Python application, we could see the query is taking a few ms and results are:


Checking the Query Data Store we could see execution plan



So, in this situation, checking the statistics we could see that Azure SQL Database created two statistics. The distribution of the data is very low:



So, let's try to add rows here running the following query: INSERT INTO Notes (ID,Name) SELECT RAND()*(100000 - 1) + 1, 'Info:'+convert(varchar(200),RAND()*(100000 - 1) + 1) FROM Notes


Right now, our query took the following time:


and right now, we could see another execution execution plan:



Why? we have a new statistics that cause a new statistics for the modifications that we have. 



Other example that we have is depending on the number of rows in the table, even forcing the execution query, we could have a new execution plan that we need to create a missing index. 



That's mean that adding a new index, we could pass from 






Exit mobile version