Ansys RedHawk-SC™ on Azure: Hold on to Your Socks

This post has been republished via RSS; it originally appeared at: Azure Compute articles.

By: Marc Swinnen, Dir. Product Marketing, Semiconductors, Ansys and Andy Chan, Director, Azure Global Solutions, Semiconductor/EDA/CAE

 

What is  Ansys RedHawk-SC?

Modern semiconductor integrated circuits (IC) can contain a staggering 50 billion transistors or more and would be impossible to design without software tools grouped under the Electronic Design Automation (EDA) category that support, automate, and verify every step of the chip design process.

 

RedHawk-SC is an EDA tool developed by Ansys that is the market leader for power integrity and reliability sign-off, which provide a vital sign-off step in the design process for all semiconductor chip design. Sign-off algorithms are extremely resource-intensive requiring hundreds of CPU cores running over many hours, making it an ideal application for cloud computing.

 

Designed for the Cloud

RedHawk-SC was architected on a cloud-friendly analysis platform called Ansys SeaScape™. RedHawk-SC’s SeaScape database is fully distributed and thrives on distributed disk access across a network. RedHawk-SC distributes the computational workload across many CPUs, or “workers”, that have low memory requirements – less than 32GB per worker. This elastic compute architecture allows for instant start as soon as just a few workers become available.

 

The distribution of the computational workload is extremely memory efficient, allowing the optimal utilization of over 2,500 CPUs. There is also no need for a heavy master node because the distribution is orchestrated by an ultra-light master scheduler using less than 2GB for even the largest chips. The same is true for loading, viewing, or debugging results.

 

RedHawk-SC Workloads on Azure

EDA applications like RedHawk-SC have specific requirements for optimal cloud deployment. We can summarize these considerations with the following points:

  • Sign-off generates very large workloads requiring thousands of CPUs
  • Large design sizes necessitate persistent or distributed storage for data and results in the cloud
  • Worker communication calls for a high-bandwidth network (10Gbps or more)

Ansys and Microsoft have worked together to evaluate the performance of realistic RedHawk-SC workloads on the Azure cloud and how to optimally configure the hardware setup.

KevinRaines_0-1617910911257.png

Table-1: RedHawk-SC resource requirements for representative small “Block” workloads, medium “Cluster/Partition” workloads, and large “Full Chip” workloads

 

Cloud Compute Models for EDA

Microsoft worked closely with Ansys to develop finely tuned solutions for RedHawk-SC running on Azure’s high-performance computing (HPC) infrastructure. These targeted reference architectures help ease the transition to Azure and allow design teams to run faster at a much lower cost.

 

IC design companies may choose to contract with cloud providers like Azure under an “all-in” model where the entire design project is conducted in the cloud or may look for a “hybrid” use model where cloud resources complement their existing in-house capacity (Figure-

KevinRaines_9-1617911132503.png

Figure-1. Hybrid versus all-in model with both the head and compute nodes in the cloud.

 

Ansys and Microsoft Azure have verified that RedHawk-SC successfully accommodates both “all-in” and “hybrid” use models and licensing.

 

Azure infrastructure optimized for EDA

To achieve the fastest possible runtimes, companies typically start by investing in processors that support the highest clock speed available. Additionally, the cloud poses other efficiency considerations such as datacenter efficiency and workflow architecture. Benchmarks show that storage in the cloud is a high-impact architectural component, as are scale technologies. Through extensive testing with realistic workloads, Microsoft and Ansys have recommended an optimized hardware configuration for running RedHawk-SC on Azure in Figure-2 (below)   The Azure Silicon team selected the following infrastructure to power this test:

  • AMD’s EPYC powered HBv2
  • Intel Cascade Lake powered FX VM family
  • Azure NetApp Files
  • CycleCloud Operations Orchestration

Azure NetApp Files is a high-performance, NFS-metered file storage service enables RedHawk-SC file applications to run without the need for code changes. CycleCloud cloud-scaling was used to support RedHawk-SC in orchestrating dynamic VM deployment.

KevinRaines_10-1617911665504.png

Figure-2: Reference architecture for running Ansys RedHawk-SC on an Azure hybrid cloud

 

RedHawk-SC shows near-linear runtime scaling as the number of CPUs is increased. This is shown for the three different workloads in Graph-1 (below).  The favorable scaling reflects the efficient distribution technology underlying RedHawk-SC’s SeaScape architecture.

KevinRaines_11-1617911760777.png

Graph-1: Runtime required to run various RedHawk-SC workloads on Microsoft Azure as a function of the number of CPUs

 

In a surprising finding from Graph-1, the total cost of running a RedHawk-SC job on Azure actually decreases as you increase the number of workers (up to the optimum threshold).  This contradicts the commonly held assumption that the total cost will increase as you enlist more CPUs (Graph-2).  The reason for this is the very high CPU utilization RedHawk-SC can achieve. The optimal number of CPUs is the number of power partitions automatically calculated by RedHawk-SC.

KevinRaines_12-1617911849918.png

Graph-2: This plot illustrates the non-intuitive decrease in total Azure costs for RedHawk-SC runs as the number of CPUs is increased to an optimal value - the number of power partitions in RedHawk-SC

 

This result is not intuitively obvious and indicates that customers should not try to reduce the CPU count to save money. In fact, they should actually increase their CPU count to the optimal value to achieve lower cost and a faster runtime.

 

Conclusion

Extensive testing of RedHawk-SC on Azure has allowed Microsoft to identify an optimized VM configuration for cloud-based EDA work. This configuration has demonstrated excellent scalability to over 2500 CPUs running on a range of realistic EDA workloads of enormous sizes. The testing further identified the optimal number of CPUs to minimize the total cost for running RedHawk-SC on Azure.  The result is that customers can easily set up their power integrity signoff analysis jobs on Azure with optimal configurations for both throughput and cost.

 

For further information contact your local sales representative or visit www.ansys.com.

 

Join us at Ansys Simulation World 2021 April 20th - 21st.  Register here.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.