This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

Why are 273 queries needed to refresh one chart?

Scenario

A customer approached me and complained that a page in a PBI report was taking a very long time to refresh. (> 100 seconds)

An investigation showed that one chart was sending 273 queries to the ADX cluster.

Was it a coincidence that the chart showed a distinct count of customers in each one of 273 points of sale?

Where are the tables in a star schema coming from?

You design your model as a star schema which is the recommended way even with big data.

The fact table is probably large, and you use direct query from ADX/RTA KQL database.

The question is what storage mode do you use for the dimensions?

The dimensions are usually not very large so technically you can import them.

They are usually not changing very frequently so you can use scheduled refresh.

To be able to update the dimensions at will, you may want to store them in some standard database like SQL server and not ingest them into ADX.

In all these cases in which the fact table is in DQ mode and the dimensions are in import mode or even DQ from a different source, you might encounter the situation described with a ton of queries for a single visual.

What exactly is the problem?

If you look at the attached example you’ll see one page using the Customers table in import mode and the other showing the same visuals where all tables are in DQ mode.

In each page , the upper chart shows a simple count of rows for each week and is fast in both pages.

The bottom chart uses measures and even a % measure that looks more complicated, but it can be calculated with a single query in both cases.

The middle visual shows the distinct count of customers for each week and this one behaves very differently when the date dimension is imported or in DQ.

The reason for this big difference is that in the import case an individual query is send to calculate the distinct count for every week, while in the direct case one query joining the two tables is calculating the results for all weeks.

The simple way to write such a query would be:

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('ContosoSales')]

SalesFact

| lookup kind=inner Dates on $left.DateKey==$right.Datekey

| where CalendarYear ==2008

| summarize Customers=dcount(CustomerKey) by CalendarWeek

| render columnchart

If you click on the Web link, it will take you to our Kusto Web Explorer and you can see that the query is blazing fast.

Power Bi generates a more complicated query but it is basically the same.

Takeaway

When your fact is in DQ mode, don’t use dimensions in import mode unless you are sure that all your measures are additive, and you don’t need to calculate distinct counts.

If the dimension is in ADX, the recommended storage mode is dual.

If the dimension table is stored in a database supported for ADX external tables define an external table in the ADX database and use it instead of importing int PBI.

When an external table is joined with another table, it will be read into memory once.

Why are 273 queries needed to refresh one chart in Power BI ?

Why are 273 queries needed to refresh one chart?

Scenario

Where are the tables in a star schema coming from?

What exactly is the problem?

Takeaway

Leave a Reply Cancel reply