Running Nextflow Data Pipelines on Azure Batch

This post has been republished via RSS; it originally appeared at: Azure Compute Blog articles.

We are delighted to announce today that the popular Nextflow workflow manager now supports running data pipelines seamlessly on Microsoft Azure via the Azure Batch service. This is a significant step forward for our customers, particularly those in Life Sciences, who are migrating genomics, machine learning, and other data pipeline or workflow oriented applications on to Azure.


Workflow managers such as Nextflow provide engineers and scientists an easy way of constructing multi-stage workloads with complex task relationships and data dependencies. Importantly, they also help abstract away the underlying task scheduler and execution platform, making users’ workloads easily portable.


The Azure-Nextflow integration was jointly developed by Microsoft and Seqera Labs and is released today in beta. Seqera Labs are the creators of the Nextflow open source project and the Nextflow Tower application stack for pipelines management. The Azure Batch integration is made available in both the Nextflow open-source project as well as the Nextflow Tower product for seamless cloud execution and hybrid cloud bursting.


With this integration, Nextflow users can now select Azure Batch as an executor and utilize Azure Blob Storage for storing data for their pipelines. Batch in turn autoscales the pools of compute nodes and schedules tasks to run on the nodes. Users only need to specify a base configuration such as the number of CPUs, the region, and the Storage Account that they wish to use in order to seamlessly execute their containerized pipelines on Azure.


"We are incredibly excited by this OSS contribution made by Microsoft to implement support for Azure Batch in Nextflow. This represents a major milestone for the project and provides the entire Nextflow community with a powerful and established cloud platform to deploy their pipelines." Paolo Di Tommaso - CTO and co-founder, Seqera Labs


Find out more about running Nextflow pipelines on Azure on the Nextflow blog. For more information or to apply to participate in the beta program, reach out to Seqera Labs at



About Azure Batch

Azure Batch provides cloud-scale job scheduling and compute management and is used to run large-scale parallel and high-performance computing (HPC) batch jobs efficiently in Azure. Azure Batch creates and manages a pool of compute nodes (virtual machines), installs the applications you want to run, and schedules jobs to run on the nodes. There's no cluster or job scheduler software to install, manage, or scale. Instead, you use Batch APIs and tools, command-line scripts, or the Azure portal to configure, manage, and monitor your jobs.


About Seqera Labs

Seqera Labs develops solutions to simplify complex data analysis pipelines. Our software enables developers and data scientists to create and securely deploy data applications in the cloud or on traditional on-premise infrastructure. The core open-source technology Nextflow transforms the building of massively scalable and distributed computing solutions. Seqera is now delivering results for customers across pharma, genomics and biotech, breaking with the status quo of closed-platforms and custom scripts and enabling them to embrace the future of distributed data analysis in the cloud.

REMEMBER: these articles are REPUBLISHED. Your best bet to get a reply is to follow the link at the top of the post to the ORIGINAL post! BUT you're more than welcome to start discussions here:

This site uses Akismet to reduce spam. Learn how your comment data is processed.