Understanding HADR in Lync Server 2013

This post has been republished via RSS; it originally appeared at: Skype for Business Blog articles.

First published on TECHNET on Sep 04, 2013

Abstract: With the introduction of new features in Lync Server 2013, IT administrators and partners can provide users a rich unified communications experience that is highly resilient to single points of failure. However, failures can – and do – happen, so the product enables a set of recovery services that allow for minimal data loss and swift re-enablement of services in the cases of server or datacenter outages.

Author: Marc Perez

Technical Review: Thomas Binder

Product version: Lync Server 2013

Publication date: September 2013

High Availability

A system is considered high available if it can tolerate the loss of one or more of its subcomponents and still provide service. For Lync Server 2013, high availability is achieved by a number of methods, in particular are two: the replicated distribution of user sets and the user data across multiple Front End servers in a pool and the mirroring of Back End servers via SQL Mirroring (preferred) or SQL Clustering ** . These two functions can ensure that a potential failure of any one Front End or Back End server (including the respective storage) can no longer present a single point of failure for the entire system.

Since these features are dependent upon the presence of multiple servers within a single pool to ensure both Front End and Back End components remain available, Standard Edition - which is an all-in-one instance of Lync Server – offers no high availability. By default, it is a single point of failure as the mechanisms for replicating data and enabling the recovery of all services (in an automated fashion) are not available if the server experiences failure. When a Standard Edition server fails, effectively all the Front Ends and Back Ends in that pool fail with it.

Variations in deployment configuration can determine where the solution provides automated resiliency and where it will need manual intervention. If manual intervention is required, the entire solution cannot be considered “highly available” as users would experience a service interruption until an administrator invoked whatever manual process is required. Additionally, consideration should be given to things like server maintenance and other server roles: maintenance on a Lync Server in a two Front End pool eliminates any availability during the maintenance window, and without a SQL Witness there can be no automated failover and failback for the backend SQL Mirror.

** While SQL Clustering is now supported by Lync Server 2013, it should be noted that SQL Mirroring – which can be configured and managed by Lync Server 2013 – is the preferred solution. For more on SQL Clustering support, see Database Software Support .

Lync Server Edition	Configuration	High Availability
Standard	Single Server	None
Standard	Paired SE pools (in data center)	Automatic for Resiliency Mode *
Enterprise	Single FE, Single BE	None
Enterprise	Single FE, Paired BEs (SQL Mirror)	None
Enterprise	Single FE, Paired BEs (SQL Mirror) + Witness	Automated Backend Failover only
Enterprise	Two FEs, Single BE	HA for Lync only w/o BE failure
Enterprise	Two FEs, Paired BEs (SQL Mirroring)	HA for Lync only w/o BE failure
Enterprise	Two FEs, Paired BEs (SQL Mirror) + Witness or SQL Cluster	Full HA during non-maintenance
Enterprise	Three+ FEs, Paired BEs (SQL Mirror) + Witness or SQL Cluster	Full HA

Table 1 - High Availability matrix by Deployment Configuration

* The Lync client will eventually utilize a backup registrar for voice if so configured, but some delay should be expected between lost connection to the home pool and the successful retry. See Planning for Central Site Resiliency in the TechNet Library for more information on configuring registrar intervals.

Disaster Recovery

Service outages are still possible when there is a hardware failure affecting an entire pool (Standard Edition server failure, network appliance, server rack, etc.) or when there is a location based challenge (such as a network outage in a particular datacenter). In these cases, re-establishing these services quickly and with minimal data loss is the focus of disaster recovery planning. Lync Server 2013 supports two disaster recovery functions via “pool pairing”: site resiliency and pool failover.

In site resiliency, users of one Lync 2013 pool can be configured to automatically connect to a backup pool for resiliency mode services (a subset of full production features) when their own pool is unavailable. This period of unavailability is configurable now for both basic SIP connectivity as well as Voice services (see footnote in Table 1 for more). For pool failover, users are manually moved from one pool to another (failover) and then back (failback). While there is no automation either failover or failback, Lync Server 2013 introduces data replication between paired pools during regular (non-disaster) service. This real-time persistent data replication enables a faster recovery of services with minimal risk of data loss in the event of a site (datacenter) failure.

Lync Server Edition	Configuration	Recovery Enabled
Standard	Paired SE Pools	Site Resiliency (Automated) and Pool Failover (RTO/RPO of 30min after manual initiation)
Enterprise	Paired EE Pools	Site Resiliency (Automated) and Pool Failover (RTO/RPO of 30min after manual initiation)

Table 2 – Site Resiliency by Product Edition

Note: Please note that while SQL Clustering is now supported, Metropolitan Site Resiliency remains unsupported for Lync Server 2013. All the nodes in a SQL Cluster serving a Lync pool – as well as the associated Front End servers – should be deployed within the same physical site represented within Topology Builder.

Configuration and Considerations

From an overall solution standpoint, there are “best practices” about how to pair pools – such as keeping pairs of only same editions (EE pools paired with EE pools, SE pools with SE pools), platforms (hardware paired with hardware, virtual paired with virtual), etc. Furthermore, it is recommended to pair pools within geographic regions to mitigate challenges with performance across WANs. When a set of users are failed over from one pool to another, their conferences are hosted on the new pool until they are failed back. If the failover is from one continent to another, all users joining the conference - even if local to each other - will traverse the WAN to join the conference hosted in the failover pool. Since elements like Call Admission Control settings and Direct Inward Dial (DID) numbers are tied to pools and are not easily transferred from one region to another (such as from North America to Europe), even an organization with robust WAN links should consider such a deployment carefully.

Finally, there are often users homed on Survivable Branch Appliances (SBAs). These are often remote office locations, and like Lync 2010, SBAs can be paired to a Lync pool in 2013 for failover. Users homed on the SBAs can, in the event of a SBA failure, have their clients redirect to the Lync pool for many services:

Users	Configuration	Resiliency Achieved
Homed on SBA	SBA paired with pool, both functional	All Services
Homed on SBA	SBA paired with pool, pool fails	Resiliency Mode
Homed on SBA	SBA paired with pool, SBA fails	All Services

Table 3 –Resiliency with SBAs

*While SBAs can be paired to a Lync pool, they are not capable of utilizing pool failover services. So if an SBA is paired with Pool A which is also paired with Pool B, and Pool A fails – users of Pool A will redirect to the backup registrar of Pool B but SBA users will not.

High Availability

Disaster Recovery

Configuration and Considerations

Leave a Reply Cancel reply