Windows Server 2012 R2 Virtual Machine Recovery from Network Disconnects

This post has been republished via RSS; it originally appeared at: Failover Clustering articles.

First published on MSDN on Sep 04, 2013

Overview


Windows Server Failover Clustering has always monitored the running state of virtual machines and the health state of clustered network and clustered storage.  We are furthering the failure detection to include monitoring of the virtual machine network and virtual switch connectivity.


Windows Server 2012 R2 introduces a new functionality that allows a virtual machine (VM) to be moved to another node in a failover cluster, using live migration if a network that it’s using becomes disconnected.  This improves the availability in cases where a network connection issue may cause clients using the services running inside the VM to be cut off by moving the VM to a node that can provide the networking access to the VM.  By default, the Protected Network setting is enabled for all virtual adapters with the assumption that most networks that a VM uses will be important enough to want to relocate the VM if it becomes disconnected.


The live migration of the VM to another node of the cluster will not occur if the destination node doesn’t have the network available that is disconnected on the current cluster node.  This avoids moving a virtual machine to a node that doesn’t have the resources that triggered the move in the first place.  Another node of the cluster will be selected to move the VM to, unless there are no nodes of the cluster available that have the required network and system resources.


VM live migrations are queued if there are more VMs that are affected by a network issue on a host than can be concurrently live migrated.  If the disconnected network becomes available again and there are VMs in the queue to be live migrated, the VMs pending will have the live migrations canceled.


The VMs network adapter settings have a new property in the advanced configuration section that allows you to select whether the network that the adapter is connected to is important enough to the availability of the VM to have it moved if it fails.  For instance, if you have an external network where clients connect to the application running inside of the VM, and another network that is used for backups, you can disable the property for the network adapter used for backups but leave it enabled for the external network.  If the backup network becomes disconnected the VM will not be moved.  If the client access network is disconnected, the VM will be live migrated to a node that has the network enabled.


It is important to note that we do recommend using network teaming for any critical networks for redundancy and seamless handling of many network failures.


Walkthrough


Let’s take walk through some of the concepts to illustrate how this functionality works and ways to configure it.


The diagram below (Diagram 1) shows a simple 2 node cluster with a VM running on it.


(Note: the network configuration depicted in this document is used as an example ; the network configuration for your systems may vary depending on the number of adapter, speed of the adapters, and other network considerations)


The parent partition, sometimes referred to as the management partition, on each node has a dedicated network adapter on each node. There is a second adapter on each node that is configured with a Hyper-V virtual switch.  The virtual machine has a synthetic network adapter that is configured to connect to the virtual switch.


If the physical network adapter that the virtual switch is using becomes disconnected, then the virtual machine will be live migrated to node B, since node B still has a connection to the network that the virtual machine uses.  The virtual machine can be live migrated from node A to B because the private network between those servers is still functioning.



Diagram 1


Configuring a VMs virtual network adapter to not cause the VM to be moved if it is disconnected


Let’s take the same configuration and add another network adapter to each of the nodes and connect it to another virtual switch on each node (see diagram 2 below).  We then configure the VM for a second virtual adapter and connect it to the new virtual switch.  For this scenario, the network may be used for backups, or for communications between VMs for which a short outage doesn’t affect the clients that use the VM.  Let’s call this new network “Backup”.


Because this new network can tolerate short outages, we want to configure the V’s virtual adapter to not be considered a critical network.  That will allow the Backup network to become disconnected without causing the VM to be moved to another node of the cluster.


To do this, open the VM’s settings, go to the virtual adapter for the Backup network, and then expand it so you see the “Advanced Features” item.  The option to clear the “Protected Network” check box will be shown (see Screen Shot 1 below).


By default, the Protected Network setting is enabled for all virtual adapters with the assumption that most networks that a VM uses will be important enough to want to relocate the VM if it becomes disconnected.



Diagram 2



Screen Shot 1


Configuring a VMs network adapter to not react to a network disconnect using Windows PowerShell


Here is the Windows PowerShell command and output that will show the virtual network adapters for a VM named “VM1”. This command will work from any node of the cluster, even if the VM is not being hosted on the node that you initiate the command from.  If you want to run the command from a node that is not part of the cluster, you can add the Get-Cluster cmdlet at the start of the command line and specify the cluster name.


PS C:\Windows\system32> Get-ClusterGroup VM1 |Get-VM | Get-VMNetworkAdapter | FL VMName,SwitchName,MacAddress,ClusterMonitored



VMName           : VM1


SwitchName       : Corp


MacAddress       : 00155D867239


ClusterMonitored : True



VMName           : VM1


SwitchName       : Storage


MacAddress       : 00155D86723A


ClusterMonitored : True



VMName           : VM1


SwitchName       : Private


MacAddress       : 00155D86723B


ClusterMonitored : True



Here is the Windows PowerShell command that will disable the ClusterMonitored property for network adapter that is configured to use the virtual switch named “Private”.


(Note that the Property is “ClusterMonitored” but the parameter to change it is “NotMonitoredInCluster.  Therefore, specifying -NotMonitoredInCluster with True actually changes the ClusterMonitored property to false, and vice-versa.) :


PS C:\Windows\system32> Get-ClusterGroup VM1 |Get-VM | Get-VMNetworkAdapter | Where-Object {$_.SwitchName -eq "Private"} | Set-VmNetworkAdapter -NotMonitoredInCluster $True


PS C:\Windows\system32> Get-ClusterGroup VM1 |Get-VM | Get-VMNetworkAdapter | FL VMName,SwitchName,MacAddress,ClusterMonitored




VMName           : VM1


SwitchName       : Corp


MacAddress       : 00155D867239


ClusterMonitored : True



VMName           : VM1


SwitchName       : Storage


MacAddress       : 00155D86723A


ClusterMonitored : True



VMName           : VM1


SwitchName       : Private


MacAddress       : 00155D86723B


ClusterMonitored : False



Testing


You can test this behavior by disconnecting the network cable for the physical adapter of a server where a VM is running.


It may take up to 1 minute for the cluster to detect that a virtual machine is affected by a network disconnect.  Each virtual machine on a cluster has a cluster resource that monitors the virtual machine for failures.  By default the cluster resource will check the state of each virtual switch that a VM is using every 60 seconds.


This means that the time a specific VM takes to identify that a virtual switch is connected to a disconnected physical NIC can be very short or up to 60 seconds, depending on the timing of when the disconnect happened and when the next check for the VM will occur.


This means that if you have more than one VM using a switch that becomes disconnected, not all the VMs will go into the state that will cause them to be live migrated at the same time.


As noted previously, if the network becomes connected again, if there are any VMs that are queued to be moved, they will be removed from the queue and remain on the same server.  Any live migrations in progress will finish.


Steven Ekren
Senior Program Manager
Windows Server Failover Clustering and High Availability

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.