Top 10 Networking Features in Windows Server 2019: #5 Network Performance Improvements for Virtual Workloads

This post has been republished via RSS; it originally appeared at: Networking Blog articles.

First published on TECHNET on Aug 22, 2018
Share On: Twitter Share on: LinkedIn

This blog is part of a series for the Top 10 Networking Features in Windows Server 2019!
-- Click HERE to see the other blogs in this series.

Look for the Try it out sections then give us some feedback in the comments!
Don't forget to tune in next week for the next feature in our Top 10 list!
The Software Defined Data-Center (SDDC) spans technologies like Hyper-V, Storage Spaces Direct (S2D), and Software Defined Networking.  Whether you have compute workloads like File, SQL, and VDI, you run an S2D cluster, or perhaps you're using your SDN environment to bring hybrid cloud to a reality, no doubt we crave network performance – we have a “need for speed” and no matter how much you have you can always use more.

In Windows Server 2016, we demonstrated 40 Gbps into a VM with Virtual Machine Multi-Queue (VMMQ).  However, high-speed network throughput came at the additional cost of complex planning, baselining, tuning, and monitoring to alleviate CPU overhead from network processing.  Otherwise, your users would let you know very quickly when the expected performance level of your solution degrades.  In Windows Server 2019, virtual workloads will reach and maintain 40 Gbps while lowering CPU utilization and eliminate the painful configuration and tuning cost previously imposed on you, the IT Pro.

To do this, we’ve implemented two new features:

  • Receive Segment Coalescing in the vSwitch

  • Dynamic Virtual Machine Multi-Queue (d.VMMQ)


These features maximize the network throughput to virtual machines without requiring you to constantly tune or over-provision your host. This lowers the Operations & Maintenance cost while increasing the available density of your hosts. The efforts outlined here cover our progress in accelerating the host and guest ; in a future article a colleague of mine (Harini Ramakrishnan) will discuss our efforts to accelerate the app in a future post.


Receive Segment Coalescing in the vSwitch


Number 1 on our playlist is an “oldie but goodie.” Windows Server 2019 brings a remix for Receive Segment Coalescing (RSC) leading to more efficient host processing and throughput gains for virtual workloads. As the name implies, this feature benefits any traffic running through the virtual switch including traditional Hyper-V compute workloads, some Storage Spaces Direct patterns, or Software Defined Networking implementations (for example, see Anirban's post last week regarding GRE gateway improvements #6 - High Performance SDN Gateways ).

Prior to this release, RSC was a hardware offload (in the NIC). Unfortunately, this optimization was disabled the moment you attached a virtual switch. As a result, virtual workloads were not able take advantage of this feature. In Windows Server 2019, RSC (in the vSwitch) works with virtual workloads and is enabled by default!  No action required your part!

Here’s a quick throughput performance example from some of our early testing.  In the task manager window on the left, you see a virtual NIC on top of a 40 Gbps physical NIC without RSC in the vSwitch. As you can see, the system requires an average of 28% CPU utilization to process 23.9 Gbps.





In the task manager window on the right, the same virtual NIC is now benefiting from RSC in the vSwitch. The CPU processing has decreased to 23% despite the receive throughput increasing to 37.9 Gbps!

Here's the performance summary:
Average CPU Utilization Average Throughput
Without RSC in the vSwitch 28% 23.9 Gbps
With RSC in the vSwitch 23% 37.9 Gbps
---Totals 17.86% Decrease in CPU 58.58% Increase in Throughput

Under the Hood


RSC in the vSwitch combines TCP segments that are a part of the same TCP-stream into larger segments destined for a Hyper-V Guest. Processing coalesced (fewer) packets is far more efficient than the processing required for segmented packets. This leads to large performance gains to Hyper-V virtual machines.

Performance gains are seen in both high and low throughput environments; high-throughput environments benefit from more efficient CPU processing (lower CPU utilization on the host) while low throughput environments may even see throughput gains in addition to the processing efficiencies. Take a look at RSC in action:


Get Started!


If you’re a Windows Server 2019 Insider and using Hyper-V, Storage Spaces Direct, Software Defined Networking (including the High Performance Gateways Anirban talked about last week!), you’re likely already consuming this feature! This feature is enabled by default ! But of course, if you’d like to compare the results yourself, check out our validation guide below.



Ready to give it a shot!?   Download the latest Insider build and Try it out!



Dynamic Virtual Machine Multi-Queue (d.VMMQ)


With the advent of 10 Gbps NICs (and higher), the processing required for the network traffic alone exceeded what could be accomplished by a single CPU. Virtual Machine Queue and its successor Virtual Machine Multi-Queue allowed traffic destined for a vmNIC to be processed by one or more different processor cores.

Unfortunately, this required complex planning, baselining, tuning, and monitoring; often more effort than the typical IT Pro intended to expend.  Even then, problems arose. If you were to introduce a heterogeneous hardware footprint in your datacenter, the optimal configuration could be varied or if tuning was needed, virtual machines may not be able to maintain a consistent level of performance.

To combat these problems, Windows Server 2019 dynamically tunes the host for maximum CPU efficiency and consistent virtual machine throughput. D.VMMQ requires no setup once a supporting driver in-place and will autotune the existing workload to ensure optimal throughput is maintained for each virtual machine. This reduces the OPEX cost imposed by previous versions of this technology.

How it Works


There are two key outcomes from this technology:

  • When network throughput is low : The system coalesces traffic received on a vmNIC to as few CPUs as possible


Here’s a VM receiving around 5.3 Gbps.




The system can coalesce all packets onto one CPU for processing efficiency.





  • When network throughput is high : The system automatically expands traffic received to as many CPUs as needed


The VMs traffic has grown to about 21 Gbps, which is more than a single CPU can handle.




The system expands the traffic across additional CPUs as necessary (and available) – In this case 5 - to maintain the demand for traffic.



Here's a quick video on Dynamic VMMQ in a low-throughput scenario.  You'll see the dynamic scheduling algorithm coalesce all the network throughput onto one core.  Then, once network traffic has completed, the queues will return to their "ready" state allowing them to expand very quickly if a burst of traffic occurs.


Get Started!


This feature requires a driver update for your NICs to a Dynamic VMMQ capable driver (referred to by some vendors as RSSv2). Drivers for Dynamic VMMQ will not be included inbox as this is an advanced feature, so please contact your IHV or OEM for the latest drivers.

If you are purchasing new hardware, you should pay special attention to the available NICs and verify that they have received the SDDC Premium Logo through our certification program (click on a specific NIC and look for SDDC Premium ). If not, Dynamic VMMQ is not supported on these devices and you will default the traditional Static mechanism.



Ready to give it a shot!?   Download the latest Insider build and Try it out!



Summary


Regardless of workload, your virtual machines need the highest possible throughput. Not only can Windows Server 2019 reach outstanding network performance, it eliminates the costly planning, baselining, and tuning required by previous Windows versions. You may still get a late-night call to troubleshoot a poorly performing virtual machine, but it won’t be because of the network throughput!

Thanks for reading and see you at Ignite!

Dan “Auto-tuning” Cuomo

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.