To RDMA, or not to RDMA – that is the question

This post has been republished via RSS; it originally appeared at: Storage at Microsoft articles.

First published on TECHNET on Mar 27, 2017
Hello, Claus here again. By now, you have probably seen some of my blogs and demos on Storage Spaces Direct performance. One of Storage Spaces Direct’s advantages is RDMA networking support that lowers latency and reduces CPU consumption. I often get the question “Is RDMA required for Storage Spaces Direct”. The answer to this question is: no . We support plain-old Ethernet as long as it’s 10GbE or better. But let’s look a bit deeper.

Recently we did a performance investigation on new hardware, comparing it with an in-market offering (more about that in another post). We ran the tests with RDMA enabled and RDMA disabled (Ethernet mode), which provided the data for this post. For this investigation, we used DISKSPD with the following configuration:

  • DISKSPD version 2.0.17

    • 4K IO

    • 70:30 read/write mix

    • 10 threads, each thread at queue depth 4 (40 total)

    • A 10GiB file per thread (“a modest VHDX”) for a total of 100GiB




We used the following hardware configuration:

  • 4 node cluster

    • Intel S2600WT Platform

    • 2x E5-2699v4 CPU (22c44t 2.2Ghz)

    • 128GiB DDR4 DRAM

    • 4x Intel P3700 NVMe per node

    • Mellanox CX3 Pro 40Gb, dual port connected, RoCE v2

    • C States disabled, OS High Performance, BIOS Performance Plan, Turbo/HT on



  • Software configuration

    • Windows Server 2016 with January roll-up package

    • No cache drive configuration

    • 3-copy mirror volume




We are by no means driving this system hard, which is on purpose since we want to show the delta between RDMA and non-RDMA under a reasonable workload and not at the edge of what the system can do.
Metric

RDMA



TCP/IP



RDMA advantage


IOPS

185,500



145,500



40,000 additional IOPS with the same workload.


IOPS/%kernel CPU

16,300



12,800



3,500 additional IOPS per percent CPU consumed.


90th percentile write latency

250µs



390µs



140µs (~36%)


90th percentile read latency

260µs



360µs



100µs (28%)



I think there are two key take-away’s from this data:

  1. Use RDMA if you want the absolute best performance. RDMA significantly boosts performance. In this test, it shows 28% more IOPS. This is realized by the reduced IO latency provided by RDMA. It also shows that RDMA is more CPU efficient (27%), leaving CPU to run more VMs.

  2. TCP/IP is no slouch, and absolutely a viable deployment option. While not quite as fast and efficient as RDMA, TCP/IP provides solid performance and is well suited for organizations without the expertise needed for RDMA.


Let me know what you think.

Until next time

Claus

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.