Multi-column statistics

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

In post SQL Server CE: Multiple single-column statistics, I discussed how SQL Server calculates Multiple single-column statistics. I'm going to talk about the 'multiple-columns statistics' today.

 

 

Here are examples , adventure 2019 OLTP  database is used in this example

 

------- Data manipulation-------------------------------------------

alter database [AdventureWorks2019] set compatibility_level=150

go

use [AdventureWorks2019]

go

if exists(select 1 from sys.tables where name='SalesOrderDetail')

drop table SalesOrderDetail

go

select * into SalesOrderDetail from Sales.SalesOrderDetail---- import all the data into new table SalesOrderDetail.

Go

--Create two statistics explicitly. The first statistics has two columns

create statistics I_ProductID_UnitPrice on SalesOrderDetail(ProductID,UnitPrice) with fullscan

create statistics I_UnitPrice  on SalesOrderDetail(UnitPrice) with fullscan

------- Data manipulation-------------------------------------------

 

NEW CE in different database compatibility  behaves differently, I'm going to demo by one by.

 

 

NEW CE-2017/2019-db_compatibility_level>=140

      a)The selectivity of combined multi single-column stats, the formula is: MAX(  min('all density',p0,p1,p2,p3),   p0*p1*p2*p3)

      b)'All density' of the combined columns  in  multi-column statistics

      c)p0,p1,p2,p3 are the selectivity of each value of the column in WHERE clause and P0<p1<p2<p3.

      d)If the multi-columns has more than 4 columns, it only counts the first 4, the rest of them are ignored.

 

1.New CE with compatibility_level 150

alter database [AdventureWorks2019] set compatibility_level=150

select * from SalesOrderDetail where ProductID=870 and UnitPrice=4.99 option(recompile)

Liwei_1-1667311216720.png

 

 

 

  1)The estimated row is 341.

  2)Here is the estimated row formula:   max(  min('All density of two columns',P1,P2),P1*P2)  *cardinality.

       a)All density of two columns is :0.001321004

       b)p0 is the selectivity of statistics of one column(ProductID) in where clause, which is 4688/121317=0.03864256

       c)p1 is the selectivity of statistics of the rest of column(UnitPrice),                    which is 8827/121317=0.07275979

       d)cardinality is 121317

dbcc show_statistics(SalesOrderDetail,I_ProductID_UnitPrice)

Liwei_2-1667311216721.png

 

 

dbcc show_statistics(SalesOrderDetail,I_UnitPrice)

Liwei_3-1667311216723.png

 

  3)Let me pass the values to the formula:  max(  min('All density of two columns',p0,p1),p0*p1)  *cardinality

max(  min(0.001321004,0.03864256,0.07275979),0.03864256*0.07275979)*121317

=max(0.001321004,0.03864256*0.07275979)*121317

=max(0.001321004,0.0028116245506624)*121317

=0.0028116245506624*121317=341.09785561, rounded down to 341.

   4)Enable trace flag 2363 gives you more detail

Dbcc traceon(3604,2363)

------------trace flag 2363 output-----------------

Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].ProductID from stats with id 2

Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].UnitPrice from stats with id 3

Cardinality using multi-column statistics 0.001321 and with independence assumption 0.00281163. Picking cardinality 0.00281163

Selectivity: 0.00281163

Stats collection generated:

  CStCollFilter(ID=3, CARD=341.098)

      CStCollBaseTable(ID=1, CARD=121317 TBL: SalesOrderDetail)

End selectivity computation

          ------------trace flag 2363 output-----------------

 

 

NEW CE-2016-db_compatibility_level=130

     a)The selectivity of multi single-column stats, the formula is: min('all density',p0,p1,p2,p3)

     b)'All density' is the one for all columns  in  multi-column statistics

     c)p0,p1,p2,p3 are the selectivity of each value of the column in WHERE clause and P0<p1<p2<p3.

     d)If the multi-columns has more than 4 columns, it only counts the first 4, the rest of them are ignored.

 

Note, if trace flag 4199 is enabled, this rule is as same as the rule in DB_compatibility_level 140/150

 

1.New CE with compatibility_level 130

alter database [AdventureWorks2019] set compatibility_level=130

select * from SalesOrderDetail where ProductID=870 and UnitPrice=4.99 option(recompile)

Liwei_4-1667311449598.png

 

 

 

 

  1)The estimated row is 160.

  2)Here is the estimated row formula:   min('all density',p0,p1,p2,p3)  *cardinality.

       a)All density of two columns is :0.001321004

       b)p0 is the selectivity of statistics of one column(ProductID) in where clause, which is 4688/121317=0.03864256

       c)p1 is the selectivity of statistics of the rest of column(UnitPrice),                    which is 8827/121317=0.07275979

       d)cardinality is 121317

dbcc show_statistics(SalesOrderDetail,I_ProductID_UnitPrice)

Liwei_5-1667311449600.png

 

 

dbcc show_statistics(SalesOrderDetail,I_UnitPrice)

Liwei_6-1667311449601.png

 

  3)Let me pass the values to the formula:   min('All density of two columns',p0,p1)  *cardinality

MIN(0.001321004,0.03864256,0.07275979)*121317=0.001321004*121317=160.260242268, is rounded down to 160.

   4)Enable trace flag 2363 gives you more detail

Dbcc traceon(3604,2363,)

------------trace flag 2363 output-----------------

Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].ProductID from stats with id 2

Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].UnitPrice from stats with id 3

Selectivity: 0.001321

Stats collection generated:

  CStCollFilter(ID=3, CARD=160.26)

      CStCollBaseTable(ID=1, CARD=121317 TBL: SalesOrderDetail)

End selectivity computation

          ------------trace flag 2363 output-----------------

 

NEW CE-2014-db_compatibility_level=120

SQL 2014 is not aware of Multi-columns stats, the algorithm is as same as Multiple single column statistics.formula is: p0 * p1^(1/2) * p2^(1/4)* p3^(1/8)

 

Note, if trace flag 4199 is enabled, this rule is as same as the rule in DB_compatibility_level 140/150

 

1.New CE with compatibility_level 120

alter database [AdventureWorks2019] set compatibility_level=120

select * from SalesOrderDetail where ProductID=870 and UnitPrice=4.99 option(recompile)

Liwei_7-1667311557565.png

 

 

 

 

  1)The estimated row is 12650.

  2)Here is the estimated row formula:   p0 * p1^(1/2) *cardinality.

       a)All density of two columns is :0.001321004

       b)p0 is the selectivity of statistics of one column(ProductID) in where clause, which is 4688/121317=0.03864256

       c)p1 is the selectivity of statistics of the rest of column(UnitPrice),                    which is 8827/121317=0.07275979

       d)cardinality is 121317

dbcc show_statistics(SalesOrderDetail,I_ProductID_UnitPrice)

Liwei_8-1667311557568.png

 

 

dbcc show_statistics(SalesOrderDetail,I_UnitPrice)

Liwei_9-1667311557570.png

 

  3)Let me pass the values to the formula:    p0 * p1^(1/2)  *cardinality

0.03864256*0.07275979^(1/2)*121317=0.0104235*121317=1264.547750, is rounded up to 1265.

   4)Enable trace flag 2363 gives you more detail

Dbcc traceon(3604,2363,)

------------trace flag 2363 output-----------------

Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].ProductID from stats with id 2

Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].UnitPrice from stats with id 3

Selectivity: 0.0104235

Stats collection generated:

  CStCollFilter(ID=2, CARD=1264.54)

      CStCollBaseTable(ID=1, CARD=121317 TBL: SalesOrderDetail)

End selectivity computation

          ------------trace flag 2363 output-----------------

 

Liwei_10-1667311955498.png

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.