Rocket-fast embedded TypeScript for MakeCode Arcade

This post has been republished via RSS; it originally appeared at: Microsoft Research.

When we began developing Microsoft MakeCode, a computing education platform, it was all about making programming easier, more engaging, and just plain friendlier. After all, if we were going to inspire the next generation of coders, easier entry into the world of computer science would be important. To accomplish this, we designed a high-level programming language with text and graphical input modalities along with a simple yet powerful set of programming libraries. Today, the language, Static TypeScript (STS), is used in all seven MakeCode editors, and its ability to meet the demands of programs requiring higher levels of performance is on full display in the latest addition to the platform, MakeCode Arcade.

With Arcade, STS lets developers of all skill levels easily write cool retro-style pixelated games—think Super Mario Bros., but even better because it’s designed by the user—to be run either inside a virtual game console in the browser or on inexpensive microcontroller-based handhelds.

STS, a subset of TypeScript, is an evolution of previous projects on beginner-friendly programming environments that started in the Research in Software Engineering group in late 2010. This work was eventually extended to include the programming of microcontrollers (MCUs), small computers typically used in refrigerators, electric toothbrushes, car control systems, and countless other everyday objects. Unfortunately, MCUs are usually programmed in C, C++, or straight in assembly, none of which are particularly beginner friendly. So over the years, efforts have been underway to run modern languages such as JavaScript and Python on them. Because full-fledged JIT compilers like V8 or ChakraCore require megabytes, not kilobytes of memory—a typical MCU has between 2 KB and 256 KB of RAM—these languages usually involve interpreters like IoT.js, Duktape, or MicroPython. The problem with interpreters is high memory usage—though not nearly as high as the demand from JIT compilers—leaving little room on the devices themselves for the program developers have written. STS, compiled in the MakeCode web application, is a more efficient alternative to the embedded interpreter approach, and statically typed, it makes for a less surprising programming experience.

Throwing back with MakeCode Arcade

While memory usage requirements initially guided the design of the MakeCode runtime system and slow execution time isn’t generally a significant problem for most MakeCode editors, execution performance became paramount with Arcade. Arcade sports a 160 × 120 pixel screen with a palette of 16 colors, a four-channel retro-appropriate sound synthesizer, and six game buttons plus menu and reset. Arcade hardware runs on MCUs with around 100 KB of RAM running at around 100 megahertz.

Handheld devices produced by Microsoft hardware partners use Arm Cortex-M4 MCUs, which have around 100 KB of RAM running at around 100 megahertz. Complex MakeCode Arcade games run at about 30 frames per second on the devices.

While the console specification—save for the speed—harks back to the 1980s, Arcade provides programmers with decidedly 2010s high-level, easy-to-use APIs. Simple games can be put together in just a few lines of code or a few blocks snapped together depending on whether you’re using the text or block editor. From there, games can be made progressively more involved—all the way to custom AI-run opponents.

Like other MakeCode editors, Arcade includes a simulator and offers programmers the choice of writing games in blocks (left) or text (right). Most programmers using the text editor are writing very simple programs that, because of type inference, look exactly the same in TypeScript and JavaScript, so for name recognition reasons, the text editor is labeled JavaScript.

To provide enough performance while still allowing a high-level of abstraction, MakeCode compiles a subset of TypeScript directly to Arm Thumb machine code, all locally in the MakeCode web application, so no installation is required. After the first load, the MakeCode web app runs fully offline, as it doesn’t rely on cloud service for compilation. Programs are then transferred using a familiar USB flash drive mechanism or directly with the new WebUSB standard.

To support static compilation, in defining STS, we dropped a number of JavaScript “features” of limited use in education, such as eval, with, and prototype-based inheritance, instead going with more traditional classes. While this prevents running of existing JavaScript libraries on MCUs, most of these libraries are not applicable or would need rewriting to fit the memory constraints. Moreover, it is perfectly possible to write complex programs without using these features; for example, the vast majority of MakeCode web app code, itself written in regular TypeScript, doesn’t use them.

Compilation and runtime system

The compiler is quite simple as far as compilers go. It uses a simple stack-based model for temporary values, and there’s no register allocation. In fact, there are no optimizations to speak of except for a simple peep-hole pass on the resulting assembly code. All values are represented uniformly as 32-bit words.

All numbers are conceptually 64-bit floating point values as mandated by JavaScript semantics. However, when the value happens to be a signed integer fitting in 31 bits, it’s stored as a tagged integer; otherwise, it’s a pointer to a boxed double. Implementations of arithmetic operations check if both arguments are integers and, if so, execute the few assembly instructions needed; otherwise, they call into a software floating-point library.

Most other values are pointers to objects either in the garbage-collected heap or read-only flash (MCUs usually have about four times as much flash as RAM, so it’s important not to copy read-only data to RAM). The first word of such objects points to a virtual table, and the subsequent words contain user-defined data fields. The v-table contains runtime type information and pointers to various methods of the object, laid out based on class hierarchy, as well as a list of properties of the object and the memory addresses where they reside.

Following regular TypeScript, the compiler doesn’t generate any runtime type checks on casts. Instead, checks are generated upon usage. A property access is compiled depending on the static type of the accessed object: For classes, it’s a simple memory lookup. For interfaces, we consult the hashed list of properties in the v-table, which is twice as slow, and for dynamic key-value mappings, we use a linear lookup, which is around six times slower (typically, this is the only technique used in embedded interpreters).

Objects—including classes, dynamic maps, binary buffers, and function closures—are allocated on a garbage-collected heap. The garbage collection (GC) algorithm is a simple, nonmoving—yet precise—mark-and-sweep. We previously used a reference counting allocator; however, given the complexity of control flow, including between concurrent threads of execution; the simplicity of the compiler; and the fact that all values are potentially reference counted, including numbers, we had to update reference counts on every function call. Switching to GC improved speed by a factor of two.

The generated code uses a C++ runtime, which is compiled once in the cloud, and the generated code is linked to it. C++ functions can declare typical signatures with integers, bools, and pointers, and appropriate conversion code is inserted by the compiler at calls. If the functions don’t store them, the runtime objects coming in as arguments require no special treatment. If the pointers to them are stored, they need to be registered with the GC.

The compiler automatically generates TypeScript function declarations from the C++ function prototypes with a specific annotation. Such annotations can also include instructions about how these are to be exposed as blocks.

Performance and what it means moving forward

Our compiler exhibits quite good performance. On benchmarks involving heavy object access, it’s at least 20 times faster than MicroPython or Duktape and less than two times slower than V8. On other benchmarks, such as floating point or heavy array access, it’s still two to eight times faster than interpreters. We’ve also implemented a custom byte-code interpreter for potential use on Xbox and iOS, which don’t allow dynamic code generation. It uses the same memory layout as the Arm back end and is about five times slower, though still much faster than other interpreters. This points to both the compiled nature and the memory layouts as causes of good performance. Complex MakeCode Arcade games run at about 30 frames per second on Arm Cortex-M4 MCUs. Extrapolating from our benchmarks, interpreters running the same code would run at an unplayable few frames per second.

We see statically typed languages such as STS playing an important role in the future of IoT, allowing embedded devices to run more efficiently—faster and, as a result, with reduced energy requirements—as well as providing programmers who aren’t embedded developers with an easier, higher-level alternative to the lower-level languages generally used to program these devices. But for now, we’re excited for kids just learning to code and advanced developers alike to have some fun creating games they want to play and share.

The post Rocket-fast embedded TypeScript for MakeCode Arcade appeared first on Microsoft Research.

Throwing back with MakeCode Arcade

Compilation and runtime system

Performance and what it means moving forward

Leave a Reply Cancel reply