This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.
COPY Into provides a new method of loading data from storage directly into a target object with lesser permissions required and no additional objects or credentials to be created in order to load the data.
For full details on Parameters and Syntax review our documentation here
Permissions
The User execution the Copy Command must have the following Permissions
- ADMINISTER DATABASE BULK OPERATIONS
- INSERT
Requirements and Limitations
With COPY Into we DO NOT Have to create any Database Scoped Credentials like we must for Polybase, the credential will be defined in the COPY Into Statement.
Each File Type has varying support for credentials Per storage type so when deciding on a credential ensure that the storage and file type supports this method of authentication to storage.
|
|||
CSV |
Parquet |
ORC |
|
Azure blob storage |
SAS/MSI/SERVICE PRINCIPAL/KEY/AAD |
SAS/KEY |
SAS/KEY |
Azure Data Lake Gen2 |
SAS/MSI/SERVICE PRINCIPAL/KEY/AAD |
SAS/MSI/SERVICE PRINCIPAL/KEY/AAD |
SAS/MSI/SERVICE PRINCIPAL/KEY/AAD |
Important note: When the source file is Parquet or Orc an External File format is required, which is defined via Create External File Format.
Storage Level Permissions
As per the table above we support certain credentials per file type per storage account type.
RBAC Storage Blob Data Contributor/Reader/Owner: AAD / MSI / Service Principal
Read and List Permissions: Shared Access Signature
Best Practices
- Avoid Specifying Wildcards that would expand over large number of files rather list multiple file locations instead
- Use dfs endpoint for ADLS Gen2 for better performance
Examples:
For all the examples below there are 2 FROM Entries, one for Storage Account Gen2 and the other for Azure Blob Storage. A data source will be either or, remove the entry which is not required.
Authenticating with Managed Service Identity (MSI)
IDENTITY: A constant with a value of ‘Managed Identity’
The MSI Authentication type requires you to create the MSI via powershell for the logical SQL Server
In order to so run the following Powershell commands, you can run this from the Azure Module if you have it installed or run the command from the Azure Portal using Cloudshell.
Connect-AzAccount
Select-AzSubscription -SubscriptionId <your-subscriptionId>
Set-AzSqlServer -ResourceGroupName <your-database-server-resourceGroup> -ServerName <your-database-servername> -AssignIdentity
-- MSI
COPY INTO <TARGET TABLE>
FROM 'https://<Storageaccount>.dfs.core.windows.net/container/file' -- Storage Account Gen2
FROM 'https://<Storageaccount>.blob.core.windows.net/container/file' -- Azure Blob Storage
WITH (
FILE_TYPE = 'CSV',
CREDENTIAL= (IDENTITY = 'Managed Identity'),
FIELDQUOTE = '"',
FIELDTERMINATOR=';',
ROWTERMINATOR='\n',
DATEFORMAT = 'ymd',
MAXERRORS = 10
)
Authenticating with Service Principals
IDENTITY: <CLIENTID>@<OAuth_2.0_V1_Token_EndPoint>
SECRET: AAD Application Service Principal key
Example :
CREDENTIAL = ( IDENTITY = 'abcdefgh-a123-4567-I9jk-l1m12345678n@https://login.microsoftonline.com/12a345bc-67d8-91ef-23gh-4i5jk678lm91/oauth2/token', SECRET = 'maopiqweh+-alskd!)@ASlkd09k1' )
-- Service Principal
COPY INTO <TARGET TABLE>
FROM 'https://<Storageaccount>.dfs.core.windows.net/container/file' -- Storage Account Gen2
FROM 'https://<Storageaccount>.blob.core.windows.net/container/file' -- Azure Blob Storage
WITH (
FILE_TYPE = 'CSV',
CREDENTIAL=(IDENTITY= '<client_id>@<OAuth_2.0_Token_EndPoint>',SECRET='<key>'),
FIELDQUOTE = '"',
FIELDTERMINATOR=';',
ROWTERMINATOR='\n',
DATEFORMAT = 'ymd',
MAXERRORS = 10
)
Authentication with AAD
When using AAD Authentication you have to be connected to the DW as the AAD User, a Service Principal Does NOT Qualify as an AAD User and if you want to make use of the Service Principal follow the guidelines above.
-- AAD
COPY INTO <TARGET TABLE>
FROM 'https://<Storageaccount>.dfs.core.windows.net/container/file' -- Storage Account Gen2
FROM 'https://<Storageaccount>.blob.core.windows.net/container/file' -- Azure Blob Storage
WITH (
FILE_TYPE = 'CSV',
FIELDQUOTE = '"',
FIELDTERMINATOR=';',
ROWTERMINATOR='\n',
DATEFORMAT = 'ymd',
MAXERRORS = 10
)
Authenticating with Storage account key
IDENTITY: A constant with a value of ‘Storage Account Key’
SECRET: Storage account key
-- Storage Account Key
COPY INTO <TARGET TABLE>
FROM 'https://<Storageaccount>.dfs.core.windows.net/container/file' -- Storage Account Gen2
FROM 'https://<Storageaccount>.blob.core.windows.net/container/file' -- Azure Blob Storage
WITH (
FILE_TYPE = 'CSV',
CREDENTIAL= (IDENTITY='Storage Account Key', Secret='<Your Account Key>'
FIELDQUOTE = '"',
FIELDTERMINATOR=';',
ROWTERMINATOR='\n',
DATEFORMAT = 'ymd',
MAXERRORS = 10
)
ORC File with AAD Authentication
The Following example is to provide you with the required steps to read data from ORC files, if you are using Parquet then a File Format for Parquet should be created as well.
CREATE EXTERNAL FILE FORMAT <File Format Name>
WITH (
FORMAT_TYPE = ORC,
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);
COPY INTO <TARGET TABLE>
FROM 'https://<Storageaccount>.dfs.core.windows.net/container/file' -- Storage Account Gen2
FROM 'https://<Storageaccount>.blob.core.windows.net/container/file' -- Azure Blob Storage
WITH (
FILE_TYPE = 'ORC',
FORMAT = <File Format Name>
FIELDQUOTE = '"',
FIELDTERMINATOR=';',
ROWTERMINATOR='\n',
DATEFORMAT = 'ymd',
MAXERRORS = 10
)
Conclusion:
In the article we covered how to execute COPY Into from a CSV File and what the basic requirements are to use the feature. I strongly advise reading the COPY Into documentation to confirm all the available options and functionality for the feature.