VNet or No VNet: Secure data access from SSIS in Azure Data Factory

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

So, you’ve finally decided to lift & shift/migrate your on-premises SQL Servers including SQL Server Integration Services (SSIS) to the cloud, now what?  Our recommended migration destination would be the all Platform as a Service (PaaS) solution of Azure SQL Database (DB)/Managed Instance (MI) and SSIS in Azure Data Factory (ADF).  This would offer you the lowest Total Cost of Ownership (TCO), highest Return on Investment (ROI), and allow you to optimally leverage your existing SQL Server licenses via Azure Hybrid Benefit (AHB) option.

 

In the past, when you ran your existing ETL workloads on premises, your SSIS instance and data stores, including SQL Servers, file shares, etc., were likely to be placed together in the same location behind a corporate firewall, so data access is more or less secure.  Now, when you run your migrated ETL workloads on SSIS Integration Runtime (IR) in ADF, you’ll want to ensure that your SSIS packages can still securely access your existing data stores on premises, as well as new ones in the cloud.  You’ll find below a few features we’ve released to ensure this.

 

Virtual Network (VNet) injection of SSIS IR

When you provision your SSIS IR, you can inject/join it to a VNet, so it can access all data stores isolated/secured in the same VNet.  If you have other data stores secured in a different VNet, you can connect/peer the VNets together, so your SSIS IR can also access those other data stores. 

 

If you have data stores secured on premises, you can connect the VNet joined by SSIS IR to your on-premises network via ExpressRoute/Virtual Private Network (VPN) gateway, so your SSIS IR can have a line-of-sight to those on-premises data stores and access them.

 

If you have data stores secured with VNet service endpoints, you can join your SSIS IR to the configured VNet/subnet in the same region to access them.  For more info on VNet service endpoints, see https://docs.microsoft.com/azure/virtual-network/virtual-network-service-endpoints-overview.

 

If you want to securely access your data stores via their private endpoint without actually injecting them to a VNet, you can establish their private link to the VNet joined by SSIS IR, so your SSIS IR can access them.  At the time of writing, the private link feature is still under preview.  For more info, see https://docs.microsoft.com/azure/private-link/private-link-overview.

 

For more info on VNet injection of SSIS IR, see https://docs.microsoft.com/azure/data-factory/join-azure-ssis-integration-runtime-virtual-network.

 

Self-Hosted IR (SHIR) as a proxy for SSIS IR

Sometimes VNet injection of SSIS IR is difficult to implement due to the overly complex configurations/restrictive policies for your corporate network.  An alternative method to access data on premises without joining a VNet is by using Self-Hosted IR (SHIR) as a proxy for SSIS IR.  After installing SHIR behind your corporate firewall, in the same on-premises location as your data stores, you can enable this feature at the design/run time of your SSIS packages merely by switching on a single connection manager property (ConnectByProxy).

 

This feature will automatically split an SSIS data flow task with on-premises data source into two staging tasks: the first one running on SHIR will move data from the on-premises data source into a staging area in your Azure Blob Storage, while the second one running on your SSIS IR will then move data from the staging area into the intended data destination.

 

This feature could also unblock certain scenarios, e.g. where your data stores are secured with VNet service endpoints in regions that are not yet supported by SSIS IR.

 

For more info on SHIR as a proxy for SSIS IR, see https://docs.microsoft.com/azure/data-factory/self-hosted-integration-runtime-proxy-ssis.

 

ADF as a trusted service

In the past, we’ve released cloud-only features where you can store your data access credentials in Azure Key Vault (AKV) or use ADF managed identity for SSIS IR to access your data stores in Azure when you run your packages as Execute SSIS Package activities in ADF pipelines, see https://docs.microsoft.com/azure/data-factory/how-to-invoke-ssis-package-ssis-activity.

 

You can enable ADF managed identity feature at the design/run time of your SSIS packages merely by switching on a single connection manager property (ConnectUsingManagedIdentity), see

https://docs.microsoft.com/sql/integration-services/connection-manager/ole-db-connection-manager?view=sql-server-ver15#managed-identities-for-azure-resources-authentication,

 

https://docs.microsoft.com/sql/integration-services/connection-manager/ado-net-connection-manager?view=sql-server-ver15#managed-identities-for-azure-resources-authentication,

 

https://docs.microsoft.com/sql/integration-services/connection-manager/azure-storage-connection-manager?view=sql-server-ver15#managed-identities-for-azure-resources-authentication.

 

Now, with the release of ADF as a trusted service to a growing list of Azure services, you can secure access to your data stores via private endpoints/VNet service endpoints/firewall rules and still allow SSIS IR to access them via ADF managed identity without having to join SSIS IR to a VNet.  At the time of writing, the list of Azure services includes Azure Storage and AKV. 

 

For more info on ADF as a trusted service, see https://techcommunity.microsoft.com/t5/Azure-Data-Factory/Data-Factory-is-now-a-Trusted-Service-in-Azure-Storage-and-Azure/ba-p/964993.

 

Bring Your Own static public IP addresses (BYOIP) for SSIS IR

In the past, you couldn’t allow the public IP addresses of SSIS IR on the firewall of your data stores, because they’re dynamic and come from a large growing range of IP addresses used by our underlying Azure infrastructure.  Now, with the release of BYOIP feature, you can create two static public IP addresses on Azure portal, assign them to your SSIS IR when joining it to a VNet, and allow them in the firewall rules for your data stores.  Your static public IP addresses must be new/unused ones of Standard type (not Basic) and have DNS names.

 

For more info on BYOIP for SSIS IR, see https://docs.microsoft.com/azure/data-factory/join-azure-ssis-integration-runtime-virtual-network#access-to-data-sources-protected-by-ip-firewall-rule.

 

VNet injection of SSIS IR vs. SHIR as a proxy for SSIS IR

As mentioned above, to access data on premises from packages running on SSIS IR in ADF, we provide two alternative methods: VNet injection of SSIS IR and SHIR as a proxy for SSIS IR (see below diagram).

 

clipboard_image_1.png

 

These methods have their own pros and cons (see below table).

 

Pros for VNet injection

Cons for SHIR as a proxy

SSIS IR is managed by Microsoft (PaaS)

SHIR is managed by yourself (non-PaaS)

Supports all SSIS components

Supports only a few SSIS components

Employs direct data transfers without staging

Employs indirect data transfers with staging

Cons for VNet injection

Pros for SHIR as a proxy

May require complex configurations related to network permissions/roles, resource group, IP addresses, subnet, DNS, NSG, UDR, etc.

Requires simple on-premises installation/network configurations

Dependent on SSIS IR regional availability

Independent of SSIS IR regional availability

May require some/more exemptions from company-specific network policies

Requires no/less exemption(s) from company-specific network policies

On-premises-to-cloud data transfers via ExpressRoute/SMB may not meet company-specific encryption requirements

On-premises-to-cloud data transfers via HTTPS are more likely to meet company-specific encryption requirements

 

In the near future, we'll reduce the cons for both methods, e.g. simplifying network configurations and expanding SSIS IR regional availability for VNet injection, as well as supporting more SSIS components for SHIR as a proxy, so watch this space!

 

I hope you’ll find these features useful and please don’t hesitate to contact us if you have any feedbacks/questions/issues.  Thank you as always for your support.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.