Understanding HADR in Lync Server 2013

This post has been republished via RSS; it originally appeared at: Skype for Business Blog articles.

First published on TECHNET on Sep 04, 2013

Abstract: With the introduction of new features in Lync Server 2013, IT administrators and partners can provide users a rich unified communications experience that is highly resilient to single points of failure.   However, failures can – and do – happen, so the product enables a set of recovery services that allow for minimal data loss and swift re-enablement of services in the cases of server or datacenter outages.

High Availability

A system is considered high available if it can tolerate the loss of one or more of its subcomponents and still provide service.  For Lync Server 2013, high availability is achieved by a number of methods, in particular are two: the replicated distribution of user sets and the user data across multiple Front End servers in a pool and the mirroring of Back End servers via SQL Mirroring (preferred) or SQL Clustering ** .  These two functions can ensure that a potential failure of any one Front End or Back End server (including the respective storage) can no longer present a single point of failure for the entire system.

Since these features are dependent upon the presence of multiple servers within a single pool to ensure both Front End and Back End components remain available, Standard Edition - which is an all-in-one instance of Lync Server – offers no high availability.  By default, it is a single point of failure as the mechanisms for replicating data and enabling the recovery of all services (in an automated fashion) are not available if the server experiences failure.  When a Standard Edition server fails, effectively all the Front Ends and Back Ends in that pool fail with it.

Variations in deployment configuration can determine where the solution provides automated resiliency and where it will need manual intervention.  If manual intervention is required, the entire solution cannot be considered “highly available” as users would experience a service interruption until an administrator invoked whatever manual process is required. Additionally, consideration should be given to things like server maintenance and other server roles: maintenance on a Lync Server in a two Front End pool eliminates any availability during the maintenance window, and without a SQL Witness there can be no automated failover and failback for the backend SQL Mirror.

** While SQL Clustering is now supported by Lync Server 2013, it should be noted that SQL Mirroring – which can be configured and managed by Lync Server 2013 – is the preferred solution.  For more on SQL Clustering support, see Database Software Support .























































Lync Server Edition



Configuration



High Availability



Standard



Single Server



None



Standard



Paired SE pools (in data center)



Automatic for Resiliency Mode *



Enterprise



Single FE, Single BE



None



Enterprise



Single FE, Paired BEs (SQL Mirror)



None



Enterprise



Single FE, Paired BEs (SQL Mirror) + Witness



Automated Backend Failover only



Enterprise



Two FEs, Single BE



HA for Lync only w/o BE failure



Enterprise



Two FEs, Paired BEs (SQL Mirroring)



HA for Lync only w/o BE failure



Enterprise



Two FEs, Paired BEs (SQL Mirror) + Witness or SQL Cluster



Full HA during non-maintenance



Enterprise



Three+ FEs, Paired BEs (SQL Mirror) + Witness or SQL Cluster



Full HA


Table 1 - High Availability matrix by Deployment Configuration

* The Lync client will eventually utilize a backup registrar for voice if so configured, but some delay should be expected between lost connection to the home pool and the successful retry. See Planning for Central Site Resiliency in the TechNet Library for more information on configuring registrar intervals.

Disaster Recovery

Service outages are still possible when there is a hardware failure affecting an entire pool (Standard Edition server failure, network appliance, server rack, etc.) or when there is a location based challenge (such as a network outage in a particular datacenter).  In these cases, re-establishing these services quickly and with minimal data loss is the focus of disaster recovery planning.  Lync Server 2013 supports two disaster recovery functions via “pool pairing”: site resiliency and pool failover.

In site resiliency, users of one Lync 2013 pool can be configured to automatically connect to a backup pool for resiliency mode services (a subset of full production features) when their own pool is unavailable.  This period of unavailability is configurable now for both basic SIP connectivity as well as Voice services (see footnote in Table 1 for more).  For pool failover, users are manually moved from one pool to another (failover) and then back (failback).  While there is no automation either failover or failback, Lync Server 2013 introduces data replication between paired pools during regular (non-disaster) service. This real-time persistent data replication enables a faster recovery of services with minimal risk of data loss in the event of a site (datacenter) failure.




















Lync Server Edition



Configuration



Recovery Enabled



Standard



Paired SE Pools



Site Resiliency (Automated) and Pool Failover (RTO/RPO of 30min after manual initiation)



Enterprise



Paired EE Pools



Site Resiliency (Automated) and Pool Failover (RTO/RPO of 30min after manual initiation)


Table 2 – Site Resiliency by Product Edition


Note: Please note that while SQL Clustering is now supported, Metropolitan Site Resiliency remains unsupported for Lync Server 2013.  All the nodes in a SQL Cluster serving a Lync pool – as well as the associated Front End servers – should be deployed within the same physical site represented within Topology Builder.

Configuration and Considerations

From an overall solution standpoint, there are “best practices” about how to pair pools – such as keeping pairs of only same editions (EE pools paired with EE pools, SE pools with SE pools), platforms (hardware paired with hardware, virtual paired with virtual), etc.  Furthermore, it is recommended to pair pools within geographic regions to mitigate challenges with performance across WANs.  When a set of users are failed over from one pool to another, their conferences are hosted on the new pool until they are failed back.  If the failover is from one continent to another, all users joining the conference - even if local to each other - will traverse the WAN to join the conference hosted in the failover pool.  Since elements like Call Admission Control settings and Direct Inward Dial (DID) numbers are tied to pools and are not easily transferred from one region to another (such as from North America to Europe), even an organization with robust WAN links should consider such a deployment carefully.

Finally, there are often users homed on Survivable Branch Appliances (SBAs).  These are often remote office locations, and like Lync 2010, SBAs can be paired to a Lync pool in 2013 for failover.  Users homed on the SBAs can, in the event of a SBA failure, have their clients redirect to the Lync pool for many services:

























Users



Configuration



Resiliency Achieved



Homed on SBA



SBA paired with pool, both functional



All Services



Homed on SBA



SBA paired with pool, pool fails



Resiliency Mode



Homed on SBA



SBA paired with pool, SBA fails



All Services


Table 3 –Resiliency with SBAs

*While SBAs can be paired to a Lync pool, they are not capable of utilizing pool failover services.  So if an SBA is paired with Pool A which is also paired with Pool B, and Pool A fails – users of Pool A will redirect to the backup registrar of Pool B but SBA users will not.


Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.