This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.
I was working on this case where a customer load a certain amount of rows and when he compared the rows that he already had on the DW to the rows that he wanted to insert the process just get stuck for 2 hours and fails with TEMPDB full.
There was no transaction open or running before the query or even with the query. The query was running alone and failing alone.
This problem could be due to different reasons. I am just trying to show some possibilities to troubleshooting by writing this post.
Following some tips:
1) Check for transactions open that could fill your tempdb
2) Check if there is skew data causing a large amount of the data moving between the nodes (https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute#:~:text=A%20quick%20way%20to%20check%20for%20data%20skew,should%20be%20spread%20evenly%20across%20all%20the%20distributions.)
3) Check the stats. Maybe the plan is been misestimate. (https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-statistics)
4)If like me, there was no skew data, no stats and no transaction open causing this.
So the next step was to check the select performance and execution plan.
1) Find the active queries:
2) By finding the query that you know is failing to get the query id:
3) Filter the step that is taking most of the time based on the last query results:
4) From the last query results filter the distribution id and spid replace those values on the next query:
We checked the XML plan that we got a sort to follow by estimated rows pretty high in each of the distributions. So it seems while the plan was been sorted in tempdb it just run out of space.
Hence, this was not a case of stats not updated and also the distribution keys were hashed as the same in all the tables involved which were temp tables.
If you want to know more:
So basically we had something like:
After we discussed we reach the conclusion what it was needed was everything from TMP1 that does not exist on TMP2 and TMP3. So as the plan with the LEFT Join was not a very good plan with a potential cartesian product we replace with the following:
This second one runs in seconds. So the estimation was much better and the Tempdb was able to handle just fine.
That is it!
Liliam
UK Engineer.