This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.
Cross-database and cross-cluster relationships in PBI over ADX tables
You may have more than one ADX database and probably more than one ADX cluster.
In some cases, you want to join tables or functions from more than one database/cluster.
In this article you’ll see how to make sure that such joins are folded and sent to the ADX backend instead of executing at the level of the Power Query mashup engine.
Everything mentioned here is applicable to Azure Data Explorer, Synapse Data Explorer, and Fabric RTA.
We’ll use the nyc_taxi table in the samples database in the help cluster.
| started by creating a query to this table in Power BI desktop,
I added a small dimension table to the ContosoSales database called rate_codes.
It can be used to add a name to the rate_code in the nyc_taxi table.
I created a query for this table in the same PBI model.
It is important to use the IsDimension=true setting in the dimension query.
Next step is creating a relationship between the two tables based on the rate_code.
I created the relationship as 1:m with assume referential integrity.
There is a small number of rides with rate codes that do not appear in the dimension table.
The number of such rides is very small, and there is a performance advantage in using inner joins.
In the attached example crossdatabase.pbix the table takes more than 50% longer to refresh with leftouter join.
Out of the box database() notation
If you look at the queries generated by PBI using .show queries, you can see that the dimension table which is coming from a different database is mentioned as database('ContosoSales').["Rate_Codes"]
This is an automatic behavior by the connector and it allows the join to be constructed in a normal way.
A more complicated case is when the two tables in a relationship are in two different clusters.
By default, PBI will not attempt to join the two tables in ADX although it is perfectly possible.
Any time that the cluster parameter is not the same in two queries, they are considered as two different sources and no joins will be generated between them.
In the attached file crosscluster.pbix, there are two queries with a relationship.
When the transactions are filtered by countries, PBI will create a list with all customer keys from the selected countries and create a where clause “where CustomerKey in(11,22,33…)”
The list of customers can be very long and affect performance.
You can see this behavior in the attached file crosscluster.pbix.
First, you need to copy the customers table to your own cluster using:
.set Customers <| cluster(“help”).database(‘ContosoSales’).Customers
When opening the pbix, it will not work until you change the values of two parameters:
After you click OK PBI will connect to each of the two clusters and will not use a join between them.
How to trigger a cross cluster join from PBI
One way would be to write the query with lookup in Kusto web explorer and export to PBI.
| lookup kind=inner
cluster('Your Cluster').database( 'Your Database').Customers
In the UI it will be
Both the fact and the dimension are in the context of the same cluster, but the third parameter includes a fully qualified table including a cluster and a database.
Another and simpler way used in the attached file crossfilter1.pbix is to create a query that uses the same cluster and database as fir the fact table, and in the table name dialog use the notation with cluster and database.
Again, I used parameters that you can edit and enter your cluster and database.
The M query will look like this in the advanced editor let
Source = AzureDataExplorer.Contents("https://help.kusto.windows.net","ContosoSales","cluster('" & DimCluster & "').database('" & DimDatabase & "').Customers", [IsDimension=true])