This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .
We are very happy to announce the private preview of Data Virtualization in Azure SQL Database. Data Virtualization in Azure SQL Database enables working with CSV, Parquet, and Delta files stored on Azure Storage Account v2 (Azure Blob Storage) and Azure Data Lake Storage Gen2. Azure SQL Database will now support: CREATE EXTERNAL TABLE (CET), CREATE EXTERNAL TABLE AS SELECT (CETAS) as well as enhanced OPENROWSET capabilities to work with the new file formats.
The list of capabilities available in private preview are:
- CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL) - SQL Server | Microsoft Learn with Shared Access Signature (SAS Token) support;
- CREATE EXTERNAL DATA SOURCE (Transact-SQL) - SQL Server | Microsoft Learn with support for Azure Storage Account v2 (ABS), and Azure Data Lake Gen2 (ADLS);
- CREATE EXTERNAL FILE FORMAT (Transact-SQL) - SQL Server | Microsoft Learn with support for CSV, and Parquet;
- CREATE EXTERNAL TABLE (Transact-SQL) - SQL Server | Microsoft Learn with support for CSV, Parquet, and Delta;
- CREATE EXTERNAL TABLE AS SELECT (CETAS) (Transact-SQL) - SQL Server | Microsoft Learn with support for CSV and Parquet;
- OPENROWSET (Transact-SQL) - SQL Server | Microsoft Learn with support for CSV, Parquet, and Delta.
Just like in SQL Server 2022 (Data Virtualization with PolyBase for SQL Server 2022 - Microsoft SQL Server Blog) and Azure SQL Managed Instance (Data virtualization now generally available in Azure SQL Managed Instance - Microsoft Community Hub), Data Virtualization in Azure SQL Database also supports updated metadata functions, wildcard search mechanism, and procedures that enables the users to query across different folders and leverage partition pruning, commands like:
Major benefits of Data Virtualization in Azure SQL Database are:
- No data movement: Access real-time data where it is.
- T-SQL language: Ability to leverage all the benefits of the T-SQL language, its commands, enhancements, and familiarity.
- One source for all your data: Users and applications can use Azure SQL Database as a data hub, accessing all the required data in a single environment.
- Security: Leverage SQL security capabilities to simplify permissions, credential management, and control
- Export: Easily export data as CSV or Parquet to any Azure Storage location, either to empower other applications or reduce cost.
For simplicity, we are going to use publicly available NYC Taxi dataset (NYC Taxi and Limousine yellow dataset - Azure Open Datasets | Microsoft Learn) that allows anonymous access.
-- Create data source for NYC public dataset: CREATE EXTERNAL DATA SOURCE NYCTaxiExternalDataSource WITH (LOCATION = 'abs://email@example.com'); -- Query all files with .parquet extension in folders matching name pattern: SELECT TOP 1000 * FROM OPENROWSET( BULK 'yellow/puYear=*/puMonth=*/*.parquet', DATA_SOURCE = 'NYCTaxiExternalDataSource', FORMAT = 'parquet' ) AS filerows; -- Schema discovery: EXEC sp_describe_first_result_set N' SELECT vendorID, tpepPickupDateTime, passengerCount FROM OPENROWSET( BULK ''yellow/*/*/*.parquet'', DATA_SOURCE = ''NYCTaxiExternalDataSource'', FORMAT=''parquet'' ) AS nyc'; -- Query top 100 files and project file path and file name information for each row: SELECT TOP 100 filerows.filepath(1) as [Year_Folder], filerows.filepath(2) as [Month_Folder], filerows.filename() as [File_name], filerows.filepath() as [Full_Path] FROM OPENROWSET( BULK 'yellow/puYear=*/puMonth=*/*.parquet', DATA_SOURCE = 'NYCTaxiExternalDataSource', FORMAT = 'parquet') AS filerows; -- Create external file format for Parquet: CREATE EXTERNAL FILE FORMAT DemoFileFormat WITH ( FORMAT_TYPE=PARQUET ); -- Create external table: CREATE EXTERNAL TABLE tbl_TaxiRides( vendorID VARCHAR(100) COLLATE Latin1_General_BIN2, tpepPickupDateTime DATETIME2, tpepDropoffDateTime DATETIME2, passengerCount INT, tripDistance FLOAT, puLocationId VARCHAR(8000), doLocationId VARCHAR(8000), startLon FLOAT, startLat FLOAT, endLon FLOAT, endLat FLOAT, rateCodeId SMALLINT, storeAndFwdFlag VARCHAR(8000), paymentType VARCHAR(8000), fareAmount FLOAT, extra FLOAT, mtaTax FLOAT, improvementSurcharge VARCHAR(8000), tipAmount FLOAT, tollsAmount FLOAT, totalAmount FLOAT ) WITH ( LOCATION = 'yellow/puYear=*/puMonth=*/*.parquet', DATA_SOURCE = NYCTaxiExternalDataSource, FILE_FORMAT = DemoFileFormat ); -- Query the external table: SELECT TOP 1000 * FROM tbl_TaxiRides;
Private Preview Sign-up form:
Data Virtualization in Azure SQL Database is in active development, Private Preview users will help shape the future of the feature, with regular interactions with Data Virtualization product team. If you want to be part of the private preview a sign-up form is required and can be found here.