Site icon

Creating a custom disaster recovery plan for your Synapse workspace Part 1

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Many of our customers have been asking about creating a disaster recovery plan for their Synapse Workspace. In a new blog series, we will cover the basics of disaster recovery and business continuity, discussing available options and custom solutions.


In this first post, we'll review important concepts and questions to answer before building a disaster recovery plan, including the differences between High Availability and Disaster Recovery.


High Availability

High availability (HA) refers to a system's ability to operate continuously without failing for a specified time. While it's impossible to achieve 100% availability, system or application design should consider three key principles to minimize service/system interruptions:

By adhering to these principles, the system can quickly recover and continue functioning even in case of a failure or outage.


To achieve high availability in the Dedicated SQL Pool Engine, we implement internal monitors that regularly check for the service health.

This is built-in and enabled by default and there is no need or way to enable or customize the behavior.

Disaster recovery

Disaster recovery is the process of keeping vital infrastructure or systems running in the event of an unexpected disaster, which could be caused by natural disasters, hardware failures, or data corruption. To achieve this, we use policies, tools, and procedures to systematically respond to unexpected events. While high availability ensures all essential business aspects keep functioning despite disruptive events, disaster recovery focuses on creating plans to support critical business functions. In disaster recovery, the primary system location is assumed to be unavailable, and it needs to be moved elsewhere. Two key targets in disaster recovery planning are recovery time objective and recovery point objective.


Recovery Time Objective

The Recovery Time Objective (RTO) refers to the time needed to restore the services and systems to eliminate any service continuity break after a disaster. In other words, it's the duration it takes to make the system available and functional again after a disruption.

Recovery Point Objective

The Recovery Point Objective (RPO) is the maximum amount of data loss that is acceptable when restoring a service. For instance, if the RPO is measured in minutes, any transactions that have occurred in that time frame may not be recovered, resulting in acceptable data loss when restoring the service.


DNS Switchover

DNS is a TCP protocol that translates IP addresses into human-readable hostnames. DNS Switchover is the capability of ensuring that applications or network services remain accessible in case of an outage. This is achieved by providing two or more IP addresses in a DNS record, each representing an identical server. This allows traffic to be moved from a failing server to a live, redundant server with minimal human intervention.


When planning a Disaster Recovery Plan for our Workspace, it's important to fill in certain details before deciding if a custom DR plan is necessary.


Answering the above questions is important and helps us to determine if we need a custom plan to ensure our BC/DR requirements.


Synapse Dedicated Pools – Initial concepts

For the Dedicated Pools, we create a DW snapshot that you can use to recover or copy your data warehouse to a previous state. These snapshots help you recover or copy your data warehouse to a previous state. If you want to customize the snapshot window, you can create user-defined restore points by taking a user-created snapshot.


Per SLA requirements, Dedicated SQL Pools have an automatic system snapshot or restore point that following some rules:


For more details, I suggest you check out the following documentation Backup and restore - snapshots, geo-redundant.


Azure Data Lake

Azure provides the capability to enable soft delete for your storage account, allowing you to recover data that was unintentionally deleted or overwritten within a configurable retention period. It is recommended to enable this feature as part of your disaster recovery plan to prevent data loss.


In terms of disaster recovery for your Data Lake, you can leverage the Azure Backup service to enable offsite backups of your Data Lake store to another region, ensuring that you have a copy of your data that can be recovered in the event of a disaster. Additionally, you can configure data replication between regions to provide further resiliency and availability.


It's important to regularly test your disaster recovery plan to ensure that your data can be recovered, and your systems can be restored in the event of a disaster. This includes testing your backups, recovery procedures, and failover capabilities to ensure that they work as expected. By regularly testing your disaster recovery plan, you can identify and address any issues before a real disaster occurs, minimizing the impact on your business operations.


From our documentation:


I highly recommend you check out the following links to help you understand how your data lake will behave during a disaster.


On our next post, we will delve into different approaches and examine each one to establish a custom disaster recovery plan for our Dedicated SQL Pools. Later in the series, we will also cover specific aspects for the Serverless Pools, Pipelines, and Spark Pools.


Be sure to stay tuned for more valuable insights on how to effectively implement Disaster Recovery strategies for your Azure data platform.


Our team publishes blog(s) regularly and you can find all these blogs here:

For deeper level understanding of Synapse implementation best practices, please refer our Success by Design (SBD) site:

Exit mobile version