This post has been republished via RSS; it originally appeared at: Azure Database Support Blog articles.
Today, I worked on a service request that our customer wants to download a large resultset from Azure SQL Database to OnPremises. The table has several blob columns (XML,Varchar(max),text) and millions of rows. In this situation, I would like to share with you several tests that I did and how to reduce the download time spent.
Initial points
- Try to increase the packet size in your connection string to higher values.
- Instead to use Proxy connection policy use Redirection connection policy to improve the connection.
- About the redirection, remember to use the latest drivers because some old drivers are not able to use redirection.
- As this process is a pure data processing, if possible, try to use Premium or Business Critical to reduce the I/O latency.
- In OnPremises try to distribute the data and log files in different location to improve the IO.
In Azure SQL Database, I created a table and filling the data:
- Basically, I created the following table:
- Running multiple times the following query, I got around 7 millions of rows.
In OnPremise:
- I developed a small C# aplication that has 3 different process:
- The first process was to read the whole table from Azure SQL Database and using bulkcopy download the data, but the spent time was high. I saw that transfer ratio was about (100-200 mb/s).
- The second process was to compress the data using COMPRESS function before downloading the data. Basically, the idea was:
- Create a table with the following structure.
- Execute the query INSERT INTO [_M$_Destination_X] (ID,NAME1,NAME2,NAME3,NAME4,NAME5) SELECT ID,COMPRESS(NAME1) AS NAME1,COMPRESS(NAME2) AS NAME2,COMPRESS(NAME3) AS NAME3, COMPRESS(NAME4) AS NAME4, COMPRESS(NAME5) AS NAME5 FROM Destination
- Download using bulkcopy the compressed data
- Uncompress the data in the destination, running the following TSQ:, INSERT INTO [Destination] (ID,NAME1,NAME2,NAME3,NAME4,NAME5) SELECT ID,DECOMPRESS(NAME1) AS NAME1,DECOMPRESS(NAME2) AS NAME2,DECOMPRESS(NAME3) AS NAME3, DECOMPRESS(NAME4) AS NAME4, DECOMPRESS(NAME5) AS NAME5 FROM [_M$_Destination_X]
- The second execution process was very good because I was able to skip the networking issue compressing and de-compressing. But, was only a thread running, what happening if I have millions and millions of rows, well, in this situation, I modified the source using a configurable number of threads (for example, running in parallel 10 threads reading 150000 rows each one).
- Every process read 150000 rows, using the following TSQL: INSERT INTO [_Tmp100] (ID,NAME1,NAME2,NAME3,NAME4,NAME5) SELECT ID,COMPRESS(NAME1) AS NAME1,COMPRESS(NAME2) AS NAME2,COMPRESS(NAME3) AS NAME3, COMPRESS(NAME4) AS NAME4, COMPRESS(NAME5) AS NAME5 FROM Destination ORDER BY ID OFFSET 0 ROWS FETCH NEXT 150000 ROWS ONLY
- Using bulkcopy I transferred the data to the OnPremise service.
- Finally running the query I was able to uncompress the data and save in the destination table, INSERT INTO [Destination] (ID,NAME1,NAME2,NAME3,NAME4,NAME5) SELECT ID,DECOMPRESS(NAME1) AS NAME1,DECOMPRESS(NAME2) AS NAME2,DECOMPRESS(NAME3) AS NAME3, DECOMPRESS(NAME4) AS NAME4, DECOMPRESS(NAME5) AS NAME5 FROM [_Tmp100]
At the end, I was able to reduce the time spent in hours for this process. I got other lessons learned for OnPremises to speed up the process and reduce the PageIOLatch contention, but, this will be for another post.
Enjoy!