This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.
We announced public preview of managed online endpoints in Azure Machine Learning, today we are excited to add new feature to this capability. You can now deploy Triton format models in Azure Machine Learning with managed online endpoints.
Triton is multi-framework, open-source software that is optimized for inference. It supports popular machine learning frameworks like TensorFlow, ONNX Runtime, PyTorch, NVIDIA TensorRT, and more. It can be used for your CPU or GPU workloads. You can deploy models using both the CLI (command line) and Azure Machine Learning studio.
Deploy model using Azure Machine Learning CLI (v2)
1. Prerequisites
The Azure CLI and the ml
extension to the Azure CLI. For more information, see Install, set up, and use the CLI (v2) (preview).
Clone azureml-examples GitHub repository.
2. Create endpoint
3. Create deployment
4. Invoke your endpoint
5. Delete your endpoint and model
Deploy model using Azure Machine Learning Studio
1. Register your model in Triton format using the following YAML and CLI command.
Get sample model from our samples GitHub repository : azureml-examples/cli/endpoints/online/triton/single-model at main · Azure/azureml-examples (github.com)
2. Deploy from Endpoints or Models page in Azure Machine Learning Studio
When you deploy a Triton format model, we do not require scoring script and environment.
Summary
Azure Machine Learning and NVIDIA Triton Inference Server integration is designed to make your model deployment experience smoother.
Resources
Documentation: High-performance serving with Triton Inference Server