Coyote: Making it easier for developers to build reliable asynchronous software

This post has been republished via RSS; it originally appeared at: Microsoft Research.

For developers, writing bug-free software that doesn’t crash is getting difficult in an increasingly competitive world where software needs to ship before it becomes obsolete. This challenge is especially apparent with online cloud services, which are often dictated by aggressive shipping deadlines. Cloud services are distributed programs comprising multiple back-end systems that continuously exchange asynchronous signals while responding to incoming web requests. They are complex by nature, hard to get right, and require protection from failures that could jeopardize client data or halt key services.

Such a programming environment is full of non-determinism, or scenarios outside developers’ control. For example, there’s non-determinism in the scheduling of concurrent operations, the order in which messages are received, the random system failures, and the random firing of timers, either for retry logic or timeouts from other services that have become unresponsive. Non-deterministic systems exist in all software domains, not just cloud services, and best practices for building and testing these systems fall short. Techniques such as failure injection and stress testing can either be too complicated to set up or time-consuming with no guarantees that found bugs can be reproduced. Consider a cloud service that, let’s say, implements the Raft consensus protocol among a group of machines in an effort to provide a highly reliable fault-tolerant cluster to clients. Such a system will have hundreds of messages flying back and forth between the machines. You do stress testing and don’t find any bugs, but can you really be confident that you’re ready to ship?

We’re excited to announce the release of Coyote, an open-source .NET framework from Microsoft Research that guides developers toward designing, implementing, and testing code in a way that embraces non-determinism and asynchrony and helps them create asynchronous systems quickly and confidently. Instead of trying to hide non-determinism, Coyote helps explicitly model non-determinism in a system and uses the information to provide a state-of-the-art testing tool. This advanced testing tool can control every source of non-determinism defined, including the exact order of every asynchronous operation, which allows it to systematically explore all the possibilities. The tool runs very quickly and reaches unheard-of levels of coverage of all non-deterministic choices in code, enabling it to find most of the tricky bugs in a way that’s also trivial to reproduce and debug.

A result of years of investment from Microsoft Research in the space of program verification and testing, Coyote is being used to build various components of Microsoft Azure Compute, such as Azure Batch, and Microsoft Azure Blockchain. The framework has received positive feedback from the Azure teams using it. One engineer said, “Features developed in Coyote test mode worked perfectly in production first time,” while another noted, “A feature that took six months without Coyote was developed in one month using Coyote.” Engineers expressed experiencing a “significant confidence boost” as a result, allowing them to “churn [out] code much faster than before.”

Coyote programming models

Coyote, which evolved from a previous Microsoft Research project called P#, is a combination of a programming model, a lightweight runtime, and a testing infrastructure all packaged as a portable library with minimal dependencies. The framework supports two main programming models: an asynchronous tasks programming model (in preview) and an asynchronous actors programming model.

If you’re happy developing your code using C# async/await construct for asynchronous tasks, then Coyote can add value on top of that. If you switch to the Coyote task library, the Coyote testing tool will look for bugs by systematically exploring the concurrency between your tasks. However, while the C# async/await feature is wonderful, it sometimes yields code that is too parallel, resulting in a lot of complexity. For example, when performing two or more concurrent tasks, you may need to guard private data with locks, and then you have to worry about deadlocks. Coyote offers an alternative that solves this with the more advanced asynchronous actors programming model.

Actors constrain your parallelism so that a given actor receives messages in a serialized order via an inbox. Actor models have gained a lot of popularity, especially in the area of distributed systems, precisely because they help manage the complexity of a system. Actors essentially embrace asynchrony by making every message between actors an async operation. Coyote fully understands the semantics of actors and can do a world-class job of testing them and finding even the most subtle bugs. The framework goes one step further, providing a type of actor called a state machine, which it knows how to fully test, ensuring every state is covered and every state transition is tested.

The above animation shows Coyote testing in action on a five-server Raft implementation that was written using Coyote. The Coyote testing tool controls message ordering intelligently and, in this case, finds a bug in the implementation where two leaders get elected. For better visualization, the animation has been slowed down from the actual testing speed.

Building blocks of Coyote applications

The Coyote programming models are easy to use, so even with minimal investment, you get the huge upside of a powerful testing tool that automatically finds bugs in your code. And the more time and resources you invest in Coyote, the greater the benefits. Coyote provides the following building blocks for more reliable software:

Task: a wrapper on .NET tasks that allows the Coyote testing tool to take control of scheduling
Actor, StateMachine, and Event: base classes for the Coyote actors programming model
Specification and Monitor: ways to embed checks into code that can be verified at test time; this also includes easy ways of monitoring liveness, ensuring that code doesn’t get stuck spinning its wheels
Timers: a way to model timing activities in a system, which is especially useful in the design of mocks that model external systems
Logging: a feature that allows you to see debug messages in context with decisions being made during a Coyote test run, including nice ways to visualize what’s happening

In addition to the above constructs, Coyote allows you to use the full power of the C# programming language. To get the best test performance, we recommend mocking all the systems outside your control. This allows the Coyote testing tool to test code locally on a laptop. The following example—a shopping cart system with all external services written as Coyote mock actors—shows a typical test setup:

To get the best test performance from the Coyote framework, it’s recommended that developers mock all the systems outside their control. Above is a typical test setup, a shopping cart system with all external services written as Coyote mock actors using the asynchronous actors programming model.

Larger teams can share their Coyote mocks for improved code reuse in testing. In fact, you can publish your Coyote mocks as a precise protocol definition of your public services. The Coyote testing tool can then be used to fully certify that new customer code is working properly with the mock model of the service before customers even attempt to use your production APIs.

Coyote mocks can be more sophisticated than normal mocks. They not only specify the asynchronous API required to talk to a service, but they can also serve as a rich model of how the service is expected to behave. Most teams are already building mocks, so switching that over to work with Coyote usually requires minimal effort.

Learn more and contribute

The Coyote package is available on NuGet, so getting started with Coyote is very simple. Coyote is also open source on GitHub and available to all who want to provide feedback and suggestions. We’d love to see your pull requests if you have specific ideas on how to improve Coyote. You can also learn more about the research behind Coyote.

We hope you, too, can benefit from more confident coding of asynchronous systems using Coyote!

The post Coyote: Making it easier for developers to build reliable asynchronous software appeared first on Microsoft Research.

Coyote programming models

Building blocks of Coyote applications

Learn more and contribute

Leave a Reply Cancel reply