Lesson Learned #165: How to reduce the time spent of downloading a large resultset

This post has been republished via RSS; it originally appeared at: Azure Database Support Blog articles.

Today, I worked on a service request that our customer wants to download a large resultset from Azure SQL Database to OnPremises. The table has several blob columns (XML,Varchar(max),text) and millions of rows. In this situation, I would like to share with you several tests that I did and how to reduce the download time spent.

 

Initial points

 

  • Try to increase the packet size in your connection string to higher values. 
  • Instead to use Proxy connection policy use Redirection connection policy to improve the connection. 
  • About the redirection, remember to use the latest drivers because some old drivers are not able to use redirection. 
  • As this process is a pure data processing, if possible, try to use Premium or Business Critical to reduce the I/O latency. 
  • In OnPremises try to distribute the data and log files in different location to improve the IO.

In Azure SQL Database, I created a table and filling the data:

 

  • Basically, I created the following table:

 

CREATE TABLE [dbo].[Destination]( [ID] [int] IDENTITY(1,1) NOT NULL, [Name1] [varchar](4000) NULL, [Name2] [varchar](4000) NULL, [Name3] [varchar](4000) NULL, [Name4] [varchar](4000) NULL, [Name5] [varchar](4000) NULL, PRIMARY KEY CLUSTERED ( [ID] ASC ))

 

  • Running multiple times the following query, I got around 7 millions of rows.

 

INSERT INTO Destination (Name1,Name2,Name3,Name4,Name5) values(Replicate('X',4000),Replicate('X',4000),Replicate('X',4000),Replicate('X',4000),Replicate('X',4000)) INSERT INTO DESTINATION (Name1,Name2,Name3,Name4,Name5) SELECT Name1,Name2,Name3,Name4,Name5 FROM DESTINATION

 

 

In OnPremise:

 

  • I developed a small C# aplication that has 3 different process: 
    • The first process was to read the whole table from Azure SQL Database and using bulkcopy download the data, but the spent time was high. I saw that transfer ratio was about (100-200 mb/s).  

 

 

private void LoadDataReaderWithoutCompression(C.SqlDataReader newProducts, int lCutOver = 10000, string sDestinationTable = "Destination") { using (C.SqlBulkCopy bulkCopy = new C.SqlBulkCopy(GetConnectionStringTarget(0), C.SqlBulkCopyOptions.KeepIdentity | C.SqlBulkCopyOptions.KeepNulls | C.SqlBulkCopyOptions.TableLock)) { bulkCopy.DestinationTableName = sDestinationTable; try { bulkCopy.BulkCopyTimeout = 6000; bulkCopy.SqlRowsCopied += new C.SqlRowsCopiedEventHandler(OnSqlRowsCopied); bulkCopy.NotifyAfter = 2000; bulkCopy.EnableStreaming = false; bulkCopy.BatchSize = lCutOver; bulkCopy.WriteToServer(newProducts); } catch (Exception ex) { Console.WriteLine(ex.Message); } } }

 

 

  • The second process was to compress the data using COMPRESS function before downloading the data. Basically, the idea was:
    • Create a table with the following structure. 
    • Execute the query INSERT INTO [_M$_Destination_X]  (ID,NAME1,NAME2,NAME3,NAME4,NAME5) SELECT ID,COMPRESS(NAME1) AS NAME1,COMPRESS(NAME2) AS NAME2,COMPRESS(NAME3) AS NAME3, COMPRESS(NAME4) AS NAME4, COMPRESS(NAME5) AS NAME5 FROM Destination
    • Download using bulkcopy the compressed data
    • Uncompress the data in the destination, running the following TSQ:, INSERT INTO [Destination] (ID,NAME1,NAME2,NAME3,NAME4,NAME5) SELECT ID,DECOMPRESS(NAME1) AS NAME1,DECOMPRESS(NAME2) AS NAME2,DECOMPRESS(NAME3) AS NAME3, DECOMPRESS(NAME4) AS NAME4, DECOMPRESS(NAME5) AS NAME5 FROM [_M$_Destination_X]

 

CREATE TABLE [dbo].[_M$_Destination_X]( [ID] [int] NOT NULL, [Name1] [varbinary](max) NULL, [Name2] [varbinary](max) NULL, [Name3] [varbinary](max) NULL, [Name4] [varbinary](max) NULL, [Name5] [varbinary](max) NULL, PRIMARY KEY CLUSTERED ( [ID] ASC ))

 

 

  • The second execution process was very good because I was able to skip the networking issue compressing and de-compressing. But, was only a thread running, what happening if I have millions and millions of rows, well, in this situation, I modified the source using a configurable number of  threads (for example, running in parallel 10 threads reading 150000 rows each one). 
    • Every process read 150000 rows, using the following TSQL: INSERT INTO [_Tmp100] (ID,NAME1,NAME2,NAME3,NAME4,NAME5) SELECT ID,COMPRESS(NAME1) AS NAME1,COMPRESS(NAME2) AS NAME2,COMPRESS(NAME3) AS NAME3, COMPRESS(NAME4) AS NAME4, COMPRESS(NAME5) AS NAME5 FROM Destination ORDER BY ID OFFSET 0 ROWS FETCH NEXT 150000 ROWS ONLY
    • Using bulkcopy I transferred the data to the OnPremise service. 
    • Finally running the query I was able to uncompress the data and save in the destination table, INSERT INTO [Destination] (ID,NAME1,NAME2,NAME3,NAME4,NAME5) SELECT ID,DECOMPRESS(NAME1) AS NAME1,DECOMPRESS(NAME2) AS NAME2,DECOMPRESS(NAME3) AS NAME3, DECOMPRESS(NAME4) AS NAME4, DECOMPRESS(NAME5) AS NAME5 FROM [_Tmp100]

 

At the end, I was able to reduce the time spent in hours for this process. I got other lessons learned for OnPremises to speed up the process and reduce the PageIOLatch contention, but, this will be for another post.

 

Enjoy!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.