# SQL Server CE: Multiple single-column statistics connected by OR

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

In post SQL Server CE: Multiple single-column statistics, I discussed the selectivity when AND is used to join the columns in Where clause. Today, I'm going to talk about the OR operator

Let's say we have four predicates connected by OR with selectivites P0,P1,P2 and P3,

Legacy CE

The combined selectivity is : 1-(1-P0)*(1-P1)*(1-P2)*(1-P3).

If there are only two predicates, the formula is relative simple: (P0+P1)-(P0*P1)

New CE

1.The combined selectivity is : 1-(1-P0)*(1-P1)^(1/2)*(1-P2)^(1/4)*(1-P3)^(1/8)

2.P0,P1 are the selectivity of each value of the column in WHERE clause and  P0>P1.

3.We can have up to 4 predicates. If there are more than 4 columns in the where clause, it only counts the first 4, the rest of them are ignored.

Combined selectivity of four predicates: 1-(1-P0)*(1-P1)^(1/2)*(1-P2)^(1/4)*(1-P3)^(1/8), where   P0>P1>P2>P3.

In practice, Please use following formula to calcualte the selectivity if the combined selectivity has more than 2 predicate.

1-

As you expected, more columns used in where clause, more estiamted rows you get.

Here are examples , adventure 2019 OLTP  database is used in this example

go

IF exists(select 1 from sys.tables where name='SalesOrderDetail' and schema_id=schema_id('dbo'))

drop table SalesOrderDetail

Go

select * into SalesOrderDetail  from Sales.SalesOrderDetail

GO

create statistics I_ProductID on SalesOrderDetail(ProductID) with fullscan

create statistics I_SalesOrderID on SalesOrderDetail(SalesOrderID) with fullscan

Here is the T-SQL query I'm going to test

select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288

The selectivity of ProductId 708 is 3007/121317=0.02478630

dbcc show_statistics(SalesOrderDetail,I_ProductID)

The selectivity of SalesOrderID 44288 is 34/121317=0.00028025

Legacy CE:

Combined selectivity:(P0+P1)-(P0*P1)=(0.02478630+0.00028025)-(0.02478630*0.00028025)=0.0250596036394250

Estimated row: 0.0250596036394250*121317=3040.15, is rounded down to 3040

select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288 option(recompile,querytraceon 9481)--trace flag 9481 is used to force legacy CE

New CE:

Combined selectivity:1-(1-P0)*sqrt(1-P1)=1-(1-0.02478630)*(sqrt(1-0.00028025)=0.0249229613952225

Estimated row: 0.0249229613952225*121317=3023.57 is rounded up to 3024

select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288 option(recompile,querytraceon 2312)--2312 is used to force New CE

Related trace flags

Trace flag 9471: Pickup the lowest selectivity(Most rows) from all predicates, and ignore rest of them. This trace flag only works when New CE is used.

Combined selectivity: Max(P0,P1,P2,P3,P4..)

In this case, it's Max(P0,P1)*CARD=MAX(0.02478630,0.00028025)*CARD=0.02478630*CARD=0.02478630*121317=3007

select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288 option(recompile,querytraceon 2312,querytraceon 9471)---2312 is used to force New CE

Trace flag 9472:Switch to independent formula, the result is as same as the result of Legacy CE and this trace flag only works when New CE is used.

select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288 option(recompile,querytraceon 2312,querytraceon 9472)--force New CE

Assessment:

T-SQL

go

IF exists(select 1 from sys.tables where name='SalesOrderDetail' and schema_id=schema_id('dbo'))

drop table SalesOrderDetail

go

select * into SalesOrderDetail  from Sales.SalesOrderDetail

GO

--create five statistics

create statistics I_ProductID on SalesOrderDetail(ProductID) with fullscan

create statistics I_orderqty on SalesOrderDetail(orderqty) with fullscan

create statistics I_SalesOrderID on SalesOrderDetail(SalesOrderID) with fullscan

create statistics I_SpecialOfferid on SalesOrderDetail(SpecialOfferid) with fullscan

create statistics I_modifiedDate on SalesOrderDetail(modifiedDate) with fullscan

1.Please calculate the estimated rows of following queries and run the queries to verify

select *From SalesOrderDetail where ProductId=708 or orderqty=10 or modifiedDate='2011-05-31 00:00:00.000' or SpecialOfferid=4  or SalesOrderID=44288 option(recompile,querytraceon 9481)

go

select *From SalesOrderDetail where ProductId=708 or orderqty=10 or modifiedDate='2011-05-31 00:00:00.000' or SpecialOfferid=4  or SalesOrderID=44288 option(recompile,querytraceon 2312)

2.If there are multi-column stats, does it make different?

This site uses Akismet to reduce spam. Learn how your comment data is processed.