Windows ML is generally available: Empowering developers to scale local AI across Windows devices

This post has been republished via RSS; it originally appeared at: Windows Blog.

The future of AI is hybrid, utilizing the respective strengths of cloud and client while harnessing every Windows device to achieve more. At Microsoft, we are reimagining what’s possible by bringing powerful AI compute directly to Windows devices, unlocking a new era of intelligence that runs where you are. With groundbreaking advancements in silicon, a modernized software stack and deep OS integration, Windows 11 is transforming into the world’s most open and capable platform for local AI. Today we are excited to share that Windows ML is now generally available for production use to assist developers with deploying production experiences in the evolving AI landscape. First introduced at Build 2025, Windows ML is the built-in AI inferencing runtime optimized for on-device model inference and streamlined model dependency management across CPUs, GPUs and NPUs, serving as the foundation for Windows AI Foundry and utilized by Foundry Local to enable expanded silicon support which is being released today. By harnessing the power of CPUs, GPUs and NPUs from our vibrant silicon partner ecosystem and building on ONNX’s strong momentum, Windows ML empowers developers to deliver real-time, secure and efficient AI workloads — right on the device. This ability to run models locally enables developers to build AI experiences that are more responsive, private and cost-effective, reaching users across the broadest range of Windows hardware. https://youtu.be/Mow9UY_9Ab4

Bring your own model and deploy efficiently across silicon – securely and locally on Windows

Windows ML is compatible with ONNX Runtime (ORT), allowing developers to utilize familiar ORT APIs and enabling easy transition for existing production workloads. Windows handles distribution and maintenance of ORT and the Execution Providers, taking that responsibility on from the App Developer. Execution Providers (EPs) are the bridge between the core runtime and the powerful and diverse silicon ecosystem, enabling independent optimization of model execution on the different chips from AMD, Intel, NVIDIA and Qualcomm. With ONNX as its model format, Windows ML integrates smoothly with current models and workflows. Developers can easily use their existing ONNX models or convert and optimize their source PyTorch models through the AI Toolkit for VS Code and deploy across Windows 11 PCs. [caption id="attachment_57579" align="alignnone" width="1024"] Diagram of Windows ML

Windows ML Stack Diagram[/caption] While AI developers work with various models, Windows ML acts as a hardware abstraction layer offering several benefits:

Simplified Deployment: Our infrastructure APIs allow developers to support various hardware architectures without multiple app builds by leveraging execution providers available on the device or by dynamically downloading them. Developers also have the flexibility to precompile their models ahead-of-time (AOT) for a streamlined end-user experience.

Reduce App Overhead: Windows ML automatically detects the user’s hardware and downloads the appropriate execution providers, eliminating the need to bundle the runtime or EPs in a developer’s application. This streamlined approach saves developers tens to hundreds of megabytes in app size when targeting a broad range of devices.

Compatibility: Through collaboration with our silicon partners, Windows ML aims to maintain conformance and compatibility, supporting ongoing updates while ensuring model accuracy across different builds through a certification process.

Advanced Silicon Targeting: Developers can assign device policies to optimize for low power (NPU), high performance (GPU) or specify the silicon used for a model.

For a more technical deep dive on Windows ML, learn more here.

Windows ML, optimized for the latest hardware in collaboration with our silicon partners

Windows 11 has a diverse hardware ecosystem that includes AMD, Intel, NVIDIA and Qualcomm and spans the CPU, GPU and NPU. Consumers can choose from a range of Windows PCs and this variety empowers developers to create innovative local AI experiences. We worked closely with our silicon partners to ensure that Windows ML can fully leverage their latest CPUs, GPUs and NPUs for AI workloads. The way this works is silicon partners build and maintain execution providers that Windows ML distributes, manages, and registers to run AI workloads performantly on-device, serving as a hardware abstraction layer for developers and a way to get optimal performance for each specific silicon. AMD has integrated Windows ML support across their Ryzen AI platform, enabling developers to harness the power of AMD silicon via AMD’s dedicated Vitis AI execution provider on NPU, GPU and CPU. Learn more. “By integrating Windows ML support across our Ryzen AI platform, AMD is making it easier for developers to harness the combined power of our CPUs, GPUs and NPUs. Together with Microsoft, we’re enabling scalable, efficient and high-performance AI experiences that run seamlessly across the Windows ecosystem.” - John Rayfield, corporate vice president, Computing and Graphics Group, AMD Intel’s EP combines OpenVINO AI software performance and efficiency with Windows ML, empowering AI developers to easily choose the optimal XPU (CPU, GPU or NPU) for their AI workloads on Intel Core Ultra processor powered PCs. Learn more. “Intel’s collaboration with Microsoft on Windows ML* empowers developers to effortlessly deploy their custom AI models and applications across CPUs, GPUs and NPUs on Intel’s AI-powered PCs. With the OpenVINO framework, Windows ML* accelerates the delivery of cutting-edge AI applications, enabling faster innovation with unmatched efficiency unlocking the full potential of Intel Core Ultra processors.” - Sudhir Tonse Udupa, vice president, AI PC Software Engineering, Intel NVIDIA’s TensorRT for RTX EP enables AI models to be executed on NVIDIA GeForce RTX and RTX PRO GPUs using NVIDIA’s dedicated Tensor Core libraries for maximum performance. This lightweight EP generates optimized inference engines — instructions on how to run the AI model — for the system’s specific RTX GPU. Learn more. “Windows ML with TensorRT for RTX delivers over 50% faster inferencing on NVIDIA RTX GPUs compared to DirectML in an easy-to-deploy package, enabling developers to scale generative AI across over 100 million Windows devices. This combination of speed and reach empowers developers to create richer AI experiences for Windows users.” - Jason Paul, vice president, Consumer AI, NVIDIA Qualcomm Technologies and Microsoft worked together to optimize Windows ML AI models and apps for the Snapdragon X Series NPU using the Qualcomm Neural Network Execution Provider (QNN EP) as well as GPU and CPU through integration with ONNX Runtime EPs.

Enabling local AI in the Windows software ecosystem

While developing Windows ML, we prioritized feedback from app developers building AI-powered features. We previously worked with app developers to test the integration with Windows ML during public preview. Leading software app developers such as Adobe, BUFFERZONE, Dot Inc., McAfee, Reincubate, Topaz Labs and Wondershare are among many others working on adopting Windows ML in their upcoming releases, accelerating the proliferation of local AI capabilities across a broad spectrum of applications. By leveraging Windows ML, our software partners can focus on building unique AI-powered features without worrying about hardware differences. Their early adoption and feedback show strong momentum toward local AI, enabling faster development and unlocking new local AI experiences across a variety of use cases:

Adobe Premiere Pro and Adobe After Effects – accelerated semantic search of content in the media library, tagging audio segments by type, and detecting scene edits, all powered by local NPU in upcoming releases; with plans to progressively migrate the full library of existing on-device models to Windows ML.

BUFFERZONE enables real-time secure web page analysis, protecting users from phishing and fraud without sending sensitive data to the cloud.

Camo by Reincubate leverages real-time image segmentation and other ML techniques to improve webcam video quality when streaming and presenting while using the NPU across all silicon providers.

Dot Vista by Dot Inc. supports hands-free voice control and optical character recognition (OCR) for accessibility scenarios, including deployments in healthcare environments using NPUs in Copilot+ PCs.
Filmora by Wondershare uses AI-powered body effects optimized for NPU acceleration on AMD, Intel and Qualcomm platforms, including real-time preview and application of Body effects such as Lightning Twined, Neon Ring and Particle Surround.
McAfee uses automatic detection of deepfake videos and other scam vectors that can be encountered on social networks.

Topaz Photo by Topaz Labs is a professional-grade image enhancement application that lets photographers sharpen details, restore focus and adjust levels on every shot they take - all powered by AI.

Simplified tooling for Windows ML

Developers can take advantage of Windows ML by starting with a robust set of tools for simplified model deployment. AI Toolkit for VS Code provides powerful tools for model and app preparation, including ONNX conversion from PyTorch, quantization, optimization, compilation and evaluation – all in one place. These features make it easier to prepare and deploy efficient models with Windows ML, eliminating the need for multiple builds and complex logic. Starting today, developers can also try custom AI models with Windows ML in AI Dev Gallery, which offers an interactive workspace to make it easier to discover and experiment AI-powered scenarios using local models.

Get started today

With Windows ML now generally available, Windows 11 provides a local AI inference framework that’s ready for production apps. Windows ML is included in the Windows App SDK (starting with version 1.8.1) and supports all devices running Windows 11 24H2 or newer. To get started developing with Windows ML:

Update your project to use the latest Windows App SDK
Call the Windows ML APIs to initialize EPs, and then load any ONNX model and start inferencing in just a few lines of code. For detailed tutorials, API reference and sample code, visit ms/TryWinML
For interactive samples of custom AI models with Windows ML, try the AI Dev Gallery at ms/ai-dev-gallery

Develop local AI solutions with Windows ML

Windows development has always been about enabling developers to do more with software and hardware. Windows ML lets both new and experienced developers build AI-powered apps easily, focusing on innovation and reducing app size. We at Microsoft are excited to see what new experiences you will create using Windows ML across Windows 11 PCs. The era of intelligent, AI-enhanced Windows apps is here – and it’s available to every developer. Let’s usher in this new wave of innovation together with Windows ML!