NEW REFERENCE ARCHITECTURE: Distributed training of deep learning models on Azure

This post has been republished via RSS; it originally appeared at: Azure Global articles.

 

Our sixth AI reference architecture (on the Azure Architecture Center) is authored by AzureCAT Mathew Salvaris, edited by Nanette Ray, and published by Mike Wasson.

Reference architectures provide a consistent approach and best practices for a given solution. Each architecture includes recommended practices, along with considerations for scalability, availability, manageability, security, and more. This architecture includes a deployable solution as well. The full array of reference architectures is available on the Azure Architecture Center.

deep_learning_models_refarch.png

This reference architecture shows how to conduct distributed training of deep learning models across clusters of GPU-enabled virtual machines (VMs). The scenario is image classification, but the solution can be generalized for other deep-learning scenarios, such as segmentation and object detection.

This architecture consists of the following components:

  • Azure Batch AI plays the central role in this architecture by scaling resources up and down according to need.
  • Blob storage is used to stage the data.
  • Azure Files is used to store the scripts, logs, and the final results from the training.
  • Batch AI file server is a single-node NFS share used in this architecture to store the training data.
  • Docker Hub is used to store the Docker image that Batch AI uses to run the training. Azure Container Registry can also be used.

 

Topics covered include:

Head over to the Azure Architecture Center to learn more about the Distributed training of deep learning models on Azure reference architecture.

 

See Also

Additional related AI reference architectures:

Find all our reference architectures here.

 

AzureCAT Guidance

"Hands-on solutions, with our heads in the Cloud!"

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.