Storage Spaces Direct throughput with iWARP

This post has been republished via RSS; it originally appeared at: Storage at Microsoft articles.

First published on TECHNET on Mar 13, 2017
Hello, Claus here again. It has been a while since I last posted here and a few things have changed since last time. Windows Server has been moved into the Windows and Devices Group, we have moved to a new building with a better café, but a worse view :smiling_face_with_smiling_eyes:. On a personal note, I can be seen waddling the hallways as I have had foot surgery.

At Microsoft Ignite 2016 I did a demo at the 28-minute mark as part of the Meet Windows Server 2016 and System Center 2016 session. I showed how Storage Spaces Direct can deliver massive amounts of IOPS to many virtual machines with various storage QoS settings. I encourage you to watch it, if you haven’t already, or go watch it again :smiling_face_with_smiling_eyes:. In the demo, we used a 16-node cluster connected over iWARP using the 40GbE Chelsio iWARP T580CR adapters, showing 6M+ read IOPS. Since then, Chelsio has released their 100GbE T6 NIC adapter, and we wanted to take a peek at what kind of network throughput would be possible with this new adapter.

We used the following hardware configuration:

4 nodes of Dell R730xd
- 2x E5-2660v3 2.6Ghz 10c/20t
- 256GiB DDR4 2133Mhz (16 16GiB DIMM)
- 2x Chelsio T6 100Gb NIC (PCIe 3.0 x16), single port connected/each, QSFP28 passive copper cabling
- Performance Power Plan
- Storage:
  - 4x 3.2TB NVME Samsung PM1725 (PCIe 3.0 x8)
  - 4x SSD + 12x HDD (not in use: all load from Samsung PM1725)
- Windows Server 2016 + Storage Spaces Direct
  - Cache: Samsung PM1725
  - Capacity: SSD + HDD (not in use: all load from cache)
  - 4x 2TB 3-way mirrored virtual disks, one per cluster node
  - 20 Azure A1-sized VMs (1 VCPU, 1.75GiB RAM) per node
  - OS High Performance Power Plan
- Load:
  - DISKSPD workload generator
  - VM Fleet workload orchestrator
  - 80 virtual machines with 16GiB file in VHDX
  - 512KiB 100% random read at a queue depth of 3 per VM

We did not configure DCB (PFC) in our deployment, since it is not required in iWARP configurations.

Below is a screenshot from the VMFleet Watch-Cluster window, which reports IOPS, bandwidth and latency.

As you can see the aggregated bandwidth exceeded 83GB/s, which is very impressive. Each VM realized more than 1GB/s of throughput, and notice the average read latency is <1.5ms.

Let me know what you think.

Until next time

@ClausJor

Leave a Reply Cancel reply