Introducing EdgelessDB: A Database Designed for Confidential Computing

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

This post is authored by Felix Schuster (Edgeless Systems).

Confidential computing is a breakthrough approach to data protection: sensitive workloads are run inside hardware-isolated and runtime-encrypted environments called enclaves. Enclaves can protect against threats like malware or rootkits and even rogue administrators and physical intruders. Azure confidential computing is at the forefront of this revolution, giving you the strongest data protection for your cloud workloads.

Edgeless Systems is on a mission to build easy-to-use open-source tools that make confidential computing accessible to everyone. Recently, we added EdgelessDB to our portfolio, the first open-source database designed for confidential computing. In this post, we’re introducing EdgelessDB and showing how it can easily be run on Intel SGX-enabled confidential computing VMs in Azure.

The benefits of a confidential database

Let’s look at how databases like MariaDB or MySQL Server typically protect data: they apply access control at runtime and optionally encrypt data on disk. These are very reasonable mechanisms. However, they don’t protect against privileged attackers able to access a database’s memory. Such attackers include, for example, malicious administrators or rootkits. To mitigate this threat, many databases support dedicated hardware security modules (HSMs). While HSMs cannot protect data at runtime, they can at least protect the cryptographic keys used to encrypt data on disk.

EdgelessDB takes things one step further: by running entirely inside a secure enclave, its data, cryptographic keys, and code are always protected and encrypted – even at runtime. Thus, even highly privileged attackers cannot access the data. These are strong security properties, and they can even be verified remotely for any given instance of EdgelessDB.

In a nutshell, you don’t need to worry about your server machine being compromised, because EdgelessDB keeps all data securely inside an enclave.

The following table summarizes the security differences between EdgelessDB and normal databases.

Use cases

There are two main reasons to use EdgelessDB. First, EdgelessDB can greatly increase data security. For example, this may allow you to move more data to the cloud.

Second, EdgelessDB's manifest and verification feature enables exciting new applications like confidential analytics of customer data or trustworthy pooling of data between companies. For instance, one can use the manifest to define that only certain enclaves (identified by their TLS certificates) with certain functionalities, can access the data. This way, one can, for example, build a system where sensitive customer data is protected by EdgelessDB and where only certain privacy-preserving AI training algorithms can run.

Using EdgelessDB

EdgelessDB is open-source software, and we provide free Docker images. Running EdgelessDB on enclave-enabled DCsv2 or DCsv3* VMs in Azure only requires a single command:

docker run -p3306:3306 -p8080:8080 --privileged -v /dev/sgx:/dev/sgx -t ghcr.io/edgelesssys/edgelessdb-sgx-1gb

If you like it even simpler, there is also a free offering in the Azure Marketplace.

Once it runs, EdgelessDB looks and feels just like a normal MySQL-compatible database. You can use it with your existing MySQL-compatible client software. There are, however, two significant differences:

You can only talk to EdgelessDB over TLS secured connections.
You need to initialize EdgelessDB with a manifest via a REST-API.

The manifest is a simple JSON file that defines the initial state and configuration of an EdgelessDB instance. Here is an example:

Here, the manifest creates a database “test” that is readable by a user “reader” and writable by a user “writer”. The manifest also defines a certificate authority (CA) to identify the users. Under “recovery”, the manifest has the public key of the party that can recover the database in case of disaster.

If you are familiar with blockchain, the manifest has some resemblance to a smart contract. It defines who can access the database and how. By leveraging the remote attestation capabilities of secure enclaves, clients can verify the manifest of an EdgelessDB instance.

You can learn more about the manifest concept in the EdgelessDB documentation.

A look under the hood

EdgelessDB shares a lot of code with MariaDB, but instead of using MariaDB’s default storage engine InnoDB, EdgelessDB uses a modified version of RocksDB. RocksDB is a high-performance storage engine developed by Facebook.

The main reason that we chose RocksDB, was that it uses a sorted string tables (SSTs) format to store data on disk. These SSTs are append-only and allow for efficient and position-dependent authenticated encryption. This is important to ensure the overall integrity of the database. Without position-dependent authenticated encryption, an attacker could possibly modify encrypted blocks or swap them within or between files. For the cryptography experts among you, we encrypt each ~4KB block in an SST file separately using AES-GCM. As initialization vector (IV), we use each block’s offset in a file, and each file has a unique key that is derived from its unique index and the database’s master key. The master key never leaves the enclave and the index to file mapping is kept in a special encrypted meta file.

Inside the enclave, everything runs on the Open Enclave SDK that has been contributed by Microsoft to the Confidential Computing Consortium and the Linux Foundation. Currently, EdgelessDB only supports Intel SGX enclaves, available in Azure VM SKUS DCsv2 and DCsv3*. Note that enclave size limitations do not affect EdgelessDB’s storage capabilities but may affect performance.

Here is a sketch of the architecture:

Benchmarks

Given all the extra security, EdgelessDB v0.1 has surprisingly small overhead. To measure it, we use the standard OLTP benchmark TPC-C, which models inventory tracking at a wholesaler with multiple warehouses. As testbed, we use an Azure DCsv3 VM (currently in limited preview, as of September 2021) with 16 cores and configure tpcc.lua to simulate 10 warehouses with 10 tables each and use 8 threads on the client side.

We compare EdgelessDB v0.1 against its closest relative: MariaDB v10.5.11 with MyRocks storage engine. The preliminary results are shown below.

While the current performance already should be satisfactory for most applications, we are confident that we can bring it down to single digit % soon. In fact, EdgelessDB v0.1 already outperforms standard MariaDB (with InnoDB as storage engine) in this benchmark.

What’s next?

By now, we hope that you are as excited about the confidential database concept as we are! We’d love to get your feedback and hear about your use cases. You can find me (Felix Schuster) on Twitter or LinkedIn. Visit https://edgeless.systems/ to learn more about EdgelessDB and our other open-source tools for confidential computing.

*DCsv3 virtual machines are the next generation of Intel SGX virtual machines, providing a higher core count and larger enclave cache sizes. At the time of this post, they are still in preview and not recommend for production workloads. Learn more here.