How Data Exfiltration Protection (DEP) impacts Azure Synapse Analytics Pipelines

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

lukeMSFT_0-1668462018070.png

Author: Luke Moloney is a Senior Program Manager in Azure Synapse Customer Success Engineering (CSE) team.

 

Data Exfiltration Protection (DEP) is a feature that enables additional restrictions on the ability of Azure Synapse Analytics to connect to other services – enabling you to further secure your Azure Synapse Analytics deployment. There are a couple of key things to know about DEP:

  1. DEP can only be enabled at Azure Synapse Analytics workspace creation and cannot be disabled at a later point. If you want to disable DEP, you will have to create a new Azure Synapse Analytics workspace and migrate all artifacts.
  2. DEP enables you to limit the communication from Azure Synapse Analytics. By requiring connections to other services to use managed private endpoints and to approved Azure AD tenants.
  3. DEP applies to all services within an Azure workspace including dedicated SQL pools, serverless SQL pools, Apache Spark pools and Pipelines.

 

This article will focus specifically on how DEP impacts the use of Synapse. Azure Data Factory does not currently support deployment with DEP.

 

Enabling DEP

DEP can only be enabled at the creation of an Azure Synapse Analytics workspace. It is enabled through the selection of ‘Allow outbound data traffic only to approved targets’, this option is only possible when creating a workspace with the ‘Managed virtual network’ option enabled. Both options are selected within the networking tab of Azure Synapse Analytics Workspace creation. These parameters are also available when programmatically deploying an Azure Synapse Analytics workspace (e.g. ARM Template, CLI ). You can learn more about creating an Azure Synapse Analytics workspace with DEP at Create a workspace with data exfiltration protection enabled - Azure Synapse Analytics.

 

Important concepts within Synapse Pipeline for understanding

Before we discuss how DEP applies to Synapse Pipelines, it is important to level-set on some Synapse Pipelines specific concepts – if you are familiar with Synapse Pipelines or Azure Data Factory you can skip over this section and jump to Synapse Pipeline connectivity without DEP enabled.

 

For a more generalized introduction to Synapse Pipelines check out this doc article.

 

Synapse Pipelines enables users to connect to a range of different data services, through what is called a Linked Service. Synapse Pipelines supports a wide range of connectors to different services including:

  • Azure services – such as Azure Storage, Azure SQL Database, Azure Database for PostgreSQL, Azure Data Explorer and Azure Cosmos DB.
  • Services from other cloud providers such as Amazon Web Services S3, Google Cloud Storage, Amazon RDS for Oracle and Google BigQuery.
  • Third-party SaaS platforms such as HubSpot, Salesforce, SAP Cloud for Customer and Xero.
  • External APIs such as REST and OData
  • On premises systems such as SQL Server, PostgreSQL, Oracle, IBM DB2 and ODBC sources.

A full list of the supported connectors is available with this link.

 

When a user creates a Linked Service they must choose an Integration Runtime which will execute this activity. There are two types of Integration Runtimes;

  1. Azure Integration Runtime (AIR)
    An Azure Integration Runtime is where Azure provides the necessary compute in a serverless manner. This means you can execute a pipeline without having to provision any infrastructure to run the Integration Runtime.
  2. Self-Hosted Integration Runtime (SHIR)
    The Self Hosted Integration Runtime allows you to host / run the integration runtime on infrastructure you control and manage. This allows an integration runtime to be hosted on-premises, in Azure VMs or in other cloud providers.

 

It’s important to note that there are some differences offered by AIRs and SHIRs – most notably that Data Flows can only be executed on AIRs. For more information including some of the feature differences please read https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime.

 

It should also be noted that some Linked Services can only be used with certain Integration Runtime types, read Pipelines and activities - Azure Data Factory & Azure Synapse for more details.

 

Synapse Pipeline connectivity without DEP enabled

Without DEP enabled, it is possible for users who have appropriate privileges within an Azure Synapse Analytics workspace to be able to run pipelines which can connect to a range of different services (through Linked Services), using an Azure Integration Runtime.

Therefore, without DEP, an appropriately permissioned user may be able to read data from or write data to a Linked Service in a way which violates an organizations policy. This could occur due a compromised account, a malicious user or lack of awareness of an organizations policy.

 

It’s important to note that DEP is only layer of protection that applies to Azure Synapse Analytics review the Azure Synapse Analytics Security Whitepaper for more information on the multiple layers of security within Azure Synapse Analytics.

 

Synapse Pipeline connectivity with DEP enabled

With DEP enabled, the behavior outlined above changes. DEP enables you to limit connections from Synapse Pipelines to a service in specified Azure AD Tenants connecting through managed private endpoints, when using the Azure Integration Runtime.

 

By default, the Azure AD tenant within which the Azure Synapse Analytics workspace is created is allowed and does not need to be added for connectivity within the same Azure AD tenant to work. You can also configure additional Azure AD tenants you would like to allow connections to, this can be done at the point of Workspace creation or at any point after that.

 

When using the Azure Integration Runtime with DEP enabled, Linked Service connection (that is to say connections to other services) must occur through managed private endpoints. The services which are supported within Azure Synapse Analytics managed private endpoints (at the time of) are:

  • Azure Storage (including Blob, Data Lake Storage Gen 2, Queue, Table and File)
  • Azure SQL Database
  • Azure SQL Managed Instance (in preview)
  • Azure Cosmos DB (SQL and Mongo API)
  • Azure Key Vault
  • Azure Search
  • Azure Database for PostgreSQL
  • Azure Database for MariaDB
  • Azure Database for MySQL
  • Azure Functions
  • Azure Cognitive Services

 

For more information as to how to set-up a managed private endpoint within an Azure Synapse Analytics workspace check out this link. It should be noted that this process will require appropriate permissions within Azure Synapse Analytics and within the service you are making the connection to. In Azure Synapse Analytics users will require ‘workspaces/managedPrivateEndpoint/write, delete’ permissions, which the Synapse Administrator and Synapse Linked Data Manager roles have with Synapse RBAC.

 

Constraints when DEP is enabled

Given DEP places restrictions on what and how connections are made to other services, this necessarily means that those Linked Services which do not support managed private endpoints cannot be connected to an Azure Integration Runtime.

 

This table provides a high-level summary of whether a Linked Service will work within Azure Synapse Analytics with DEP enabled.

 

Service is not supported with Synapse Managed Private Hub

Service is supported within Synapse Managed Private Hub

Outside an approved Azure AD tenant

Not accessible

Not accessible

Within an approved Azure AD Tenant

Not accessible

Accessible once a managed private endpoint is created.

 

Some common scenarios what will not work when using the Azure Integration Runtime include:

  • Calling external REST APIs such as:
    • Using a Web activity to orchestrate a refresh of a Power BI dataset
    • Using a Copy activity to copy data from a third-party REST API
  • Copying data from Amazon S3
  • Copying data from Dynamics 365
  • Copying data from a SharePoint online list

 

Ways to address DEP constraints

It’s important to note that working around the constraints of DEP should be something that is worked through as part of any security review to ensure that your Azure Synapse Analytics deployment remains compliant with your organizational policies and requirements.

The primary way to address the constraints of DEP when using, is to leverage the Self-Hosted Integration Runtime. As a Self-Hosted Integration Runtime is deployed on infrastructure you manage, this allows you / your organization to fully control – through traditional networking controls (e.g. Proxy, outbound Firewall) – which endpoints it can connect to. DEP does not impact the behavior of Self-Hosted Integration Runtimes.

 

Therefore, if you need to connect to endpoints which are not available when using DEP, you can choose to execute that activity on a Self-Hosted Integration Runtime instead of the Azure Integration Runtime. The abilities to log, control and limit a Self-Hosted Integration Runtime means that this should ensure that your organization’s compliance, regulatory or other policy requirements are able to be met.

 

Should you use DEP?

If you need the protections that DEP provides – then yes of course you should enable DEP. If you don’t need those guarantees, then you should very carefully consider the constraints DEP will impose on your Azure Synapse Analytics workspace and whether they make sense given the scope and vision for your Azure Synapse Analytics project.

 

DEP imposes a particular set-up of Network security controls, within Azure Synapse Analytics network security is simply one of many layers of security. You can find out more information about how Azure Synapse Analytics works with the other layers in our security whitepaper available here. For many customers these constraints are not worth the advantages and a combination of appropriate source control, release process and RBAC controls meet their needs.

 

Closing thoughts and resources

As you can see DEP can provide additional protections for your Azure Synapse Analytics deployments, but these protections come with capability trade-offs. You can find out more information about DEP at Data exfiltration protection for Azure Synapse Analytics workspaces - Azure Synapse Analytics | Microsoft Docs.

 

My colleague Vengatesh has a number of videos available on the Azure Synapse Analytics YouTube channel which can further your learnings.

 

For those of you just getting started with Azure Synapse Analytics I would highly recommend our Azure Synapse Success by Design guidance, which includes a great Proof of concept playbook and our implementation success methodology.

 

Finally – we’d love for you to leave a comment on how you found this blog, any experiences you have had with DEP and any future topics you'd like to be see covered.

Our team publishes blog(s) regularly and you can find all these blogs here: https://aka.ms/synapsecseblog

 

For deeper level understanding of Synapse implementation best practices, please refer our Success By Design (SBD) site: https://aka.ms/Synapse-Success-By-Design

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.