Best Practice for Running Cadence Spectre X on Microsoft Azure

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

Electronic Design Automation (EDA) consists of a set of software (tools) and workflows for designing semiconductor products, most noticeable being advanced computer chips. Given today's rapid pace of innovation, there is a growing demand for higher performance, smaller chip sizes, and lower power consumption chips. EDA tools require multiple nodes and many CPUs (cores) in a cluster to meet this growing demand. A high-performance network and a centralized file system support this multi-node cluster, ensuring that all components of a single cluster can act as a single unified whole to provide both consistency and scale-out performance.

The cloud's ability to provide EDA engineers with access to massive computing capacities has been widely publicized. What is not widely publicized is the critical high-performance networking and centralized file system infrastructure required to ensure the massive compute capacities can work together as one.

In this example we use the Cadence Spectre X simulator, which is a leading EDA tool for solving large-scale verification simulation challenges for complex analog, RF, and mixed-signal blocks and subsystems. Spectre X simulator is capable for users to massively distribute simulation workloads to Azure cloud to fully utilize up to thousands of CPU cores computing resources to improve performance and capacity.

Objective

In this article, we will first provide performance best practices for compute nodes when running EDA simulations on Azure. We will exercise those best practices to run Spectre X jobs among different Azure virtual machines (VMs) in different scenarios, including single-threaded, multi-threaded, and XDP (distributed mode). We will then conduct cost-effective analysis to provide you guidance about which VMs are the most suitable.

Table 1: EDA Tools landscape, with Circuit Simulation tools in red rectangle.

Best Practice running EDA on Azure

Below are the key configurations for best performance for Compute nodes when running EDA simulations on Azure:

Reside resources in the same Proximity Placement Group, to improve networking performance between VMs.
Disable Hyper Threading (HT) for all VMs, which would boost performance especially for CPU bound EDA tools like Spectre X.
Enable Accelerated Networking by default to ensure low network latency among compute VMs and front-end storage solutions.

For a full list of best practice please refer to the Azure for the Semiconductor industry whitepaper, which includes leveraging CycleCloud to help orchestrating Azure resources to run their HPC type workload.

Spectre X benchmark environment

Figure 1: Spectre X benchmark environment architecture

The compute VMs, license server VM, and storage solution all reside in the same Proximity Placement Group. The network latency among VMs to license server and storage is 0.1~0.2 milliseconds. We used Azure NetApp Files (ANF) as our Network File System (NFS) storage solution with a Premium 4TiB volume, which provides up to 256 MB/s throughput. Cadence Spectre X (version 20.10.348) is installed on that ANF volume. The testing design is a representative Post Layout DSPF design with 100+K circuit inventories. The design and the output files are stored in the same ANF volume as well.

Azure VMs benchmarked

Table 2: List of Azure VMs benchmarked

Table 2 shows all the Azure VMs that have been benchmarked, along with their CPU type, memory size and local disks. Hyper Threading (HT) is a technique for splitting a single physical core (pCPU) into 2 virtual cores (vCPUs) in Azure. For example, FX48mds was designed for low core count and high memory per core ratio (48GB per physical core), which are best for EDA back-end workloads. And M-series, with total memory sized over 11TB, are suitable for Post tape-out or other jobs require intensive memory. Some VMs are HT enabled and some are not. Because disabling HT is one of our best performance practices, we calculated the price per physical core per hour in the rightmost column for the cost-effective analysis later. All list prices in Azure EAST US region as March 2022.

Resources utilization when running Spectre X

The simulations are run by altering the number of threads per job (+mt option), and the total elapsed time is retrieved from the output log files for each run. Below is the example command to run a single-threaded (+mt=1) Spectre X job.

spectre -64 +preset=cx +mt=1 input.scs -o SPECTREX_cx_1t +lqt 0 -f sst2

As expected, we found that during the run, the number of utilized CPUs will be equal to the number of threads per job. Figure 2 is a screenshot that shows CPU utilization kept 95+% on 16 CPUs when running a 16-threaded Spectre X job on a 44-cores VM (HC44rs). We also observed low storage read/write operations (<2k IOPS), low network bandwidth utilization, and small memory utilization during the run. Which indicates Spectre X is a very compute-intensive and CPU-bound workload.

Figure 2: CPU utilization kept 95+% on 16 CPUs when running a 16-threaded Spectre X job on a 44-cores VM (HC44rs)

Benchmark results

Performance among VMs

Figure 3 shows the elapsed runtime (sec) when running a single-threaded Spectre X job on different VMs. The slower the better. HBv3 performs the best no matter if it’s running on a standard 120 cores VM, or on a constrained 16 cores VM. D64 v5 and E64 v5 ranked second in the test. HBv3, D64 v5 and E64 v5 can perform 10~30% better than the same test over other clouds.

Figure 3: Elapsed runtime in seconds when running a single-threaded Spectre X job on different VMs with HT disabled. The lower the better.

Performance improves linearly of multithreading jobs

Figure 4 shows how the elapsed time decreases when running the job in multi-threads. We observed the elapsed time decreased linearly and stably, when running the same job in 1, 2, 4, and 8 threads across all VMs. The same job runs in 8 threads can be 4~6x faster than runs in single threaded.

Figure 4: Elapsed time decreases linearly when increasing # of threads per Spectre X job.

Performance improves when disabling Hyper Threading

Figure 5 shows that disabling Hyper Threading (HT) can perform up to 5~10% better than when HT is enabled, especially when running in the small number of threads per job.

Figure 5: Disable Hyper Threading (HT) can perform up to 5~10% better than when HT is enabled

Distributed Multithreading performance relies on inter-node communication bandwidth

Cadence recommends running Spectre X jobs in the same VM to avoid any inter-communication overheads. Though it does support using other VM’s cores across VMs to run multithreading jobs, which is called “distributed multithreading” (XDP mode). In the “distributed multithreading” scenario, inter-node communication bandwidth is critical in performance. Cadence recommends having a minimum of 10GbE connection between VMs to reduce latency and to improve overall performance.

Azure Accelerated Networking feature, which is by default enabled for Azure VMs, provides up to 50Gbps of throughput and 10x reductions in network latency among VMs and to front-end storage. Figure 6 shows that Spectre X can enjoy up to 5% performance boost in Azure over other clouds just on the networking component.

Figure 6: Spectre X enjoyed up to 5% performance boost in Azure over other clouds in Distributed Multithreading scenarios

Cost-effective Analysis

Figure 7 shows a preliminary cost estimation considering only the Compute VMs’ cost. The estimation assumes that 500 single-threaded jobs are submitted at the same time, so we would know how many VMs are needed for the 500-jobs run based on the number of physical cores per VM. The total elapsed time was then calculated based on the results in Figure 3, multiplied by 500. The total cost of the Compute VMs can be calculated along with the cost information in Table 2.

Figure 7: A preliminary cost estimation which considers only the Compute VMs’ cost. Assuming 500 single-threaded jobs submitted at the same time. Total elapsed Time calculation is based on Figure 3 and rounding up to integer, ignoring any interactive influence. Total VM cost is based on Table 2, rounding up to integer too.

Please note the overall cost running EDA simulations contains 2 main parts: EDA license cost and infrastructure cost (including computing, networking, storage, operations, management, etc.). In fact, EDA license cost takes a substantial proportion of total cost and is proportional to the elapsed time. With that in mind, HBv3 is considered a preferred Azure VM to run Spectre X simulation, not only because it has the most inexpensive total VM cost, but also because its smallest total elapsed time would significantly save on the license costs. In addition, based on our experience, HBv3 is also the most cost-effective VM for all other circuit simulation EDA tools including Synopsys HSpice and SiliconSmart, Cadence Liberate, Mentor Edlo/AFS, Empyrean Qualib and ALPS.

Summary

In this article, we benchmarked Spectre X jobs among different Azure VMs, and excised the below best practice:

Reside resources in the same Proximity Placement Group.
Disable Hyper Threading (HT) for all VMs.
Enable Accelerated Networking by default.

We found the Azure HBv3 series VM performs the best and both the D64 v5 and E64 v5 series VMs ranked second; all of which can perform ~10~30% better than other clouds. We found the Spectre X performance improves linearly and stably on Azure when running multi-threading jobs. We verified that disabling HT would boost performance by 5~10%. And Spectre X enjoyed a 5% performance boost in Azure over other clouds in Distributed Multithreading scenarios. Finally, we did a preliminary analysis and found HB120v3 is the most cost-effective VM for Cadence Spectre X.

Learn More

Learn more about Azure High-Performance Compute
Learn more about Azure HBv3 virtual machines; now upgraded

Leave a Reply Cancel reply