LST-Bench: A new benchmark tool for open table formats in the data lake

This post has been republished via RSS; it originally appeared at: Microsoft Research.

This paper was presented at the ACM SIGMOD/Principles of Database Systems Conference (opens in new tab) (SIGMOD/PODS 2024), the premier forum on large-scale data management and databases.

As organizations grapple with ever-expanding datasets, the adoption of data lakes has become a vital strategy for scalable and cost-effective data management. The success of these systems largely depends on the file formats used to store the data. Traditional formats, while efficient in data compression and organization, falter with frequent updates. Advanced table formats like Delta Lake, Apache Iceberg, and Apache Hudi offer promising solutions with easier data modifications and historical tracking, yet their efficacy lies in their ability to handle continuous updates, a challenge that requires extensive and thorough evaluation.

Our paper, “LST-Bench: Benchmarking Log-Structured Tables in the Cloud (opens in new tab),” presented at SIGMOD 2024, introduces an innovative tool designed to evaluate the performance of different table formats in the cloud. LST-Bench builds on the well-established TPC-DS (opens in new tab) benchmark—which measures how efficiently systems handle large datasets and complex queries—and includes features specifically designed for table formats, simplifying the process of testing them under real-world conditions. Additionally, it automatically conducts tests and collects essential data from both the computational engine and various cloud services, enabling accurate performance evaluation.

Flexible and adaptive testing

Designed for flexibility, LST-Bench adapts to a broad range of scenarios, as illustrated in Figure 1. The framework was developed by incorporating insights from engineers, facilitating the integration of existing workloads like TPC-DS, while promoting reusability. For example, each test session establishes a new connection to the data-processing engine, organizing tasks as a series of statements. This setup permits developers to run multiple tasks either sequentially within a single session or concurrently across various sessions, reflecting real-world application patterns.

A diagram showing workload components in LST-Bench and their relationships. — Figure 1. Workload components in LST-Bench and their relationships. A task is a sequence of SQL statements, while a session is a sequence of tasks that represents a logical unit of work or a user session. A phase is a group of concurrent sessions that must be completed before the next phase can start. Lastly, a workload is a sequence of phases.

The TPC-DS workload comprises the following foundational tasks:

Load task: Loads data into tables for experimentation.
Single User task: Executes complex queries to test the engine’s upper performance limit.
Data Maintenance task: Handles data insertions and deletions.

LST-Bench introduces the following tasks specific to table formats:

Optimize task: Compacts the data files within a table.
Time Travel task: Enables querying data as it appeared at a specified point in the past.
Parameterized Custom task: Allows for the integration of user-defined code to create dynamic workflows.

These features enable LST-Bench to evaluate aspects of table formats that are not covered by TPC-DS, providing deeper insights into their performance, as shown in Figure 2.

A diagram illustrating various LST-Bench tasks combined to create workloads that provide insights into table formats. The workloads assess the handling of frequent data modifications over time, optimizing tables for multiple modifications of varying sizes, managing simultaneous reading and writing sessions, querying data across different time points, and evaluating the impact of batch size variations on read query performance. — Figure 2. LST-Bench expands on TPC-DS by introducing a flexible workload representation and incorporating extensions that help users gain insights into table formats previously overlooked by the original benchmark.

A degradation rate metric to measure stability

In addition to these workload extensions, LST-Bench introduces new metrics to evaluate table formats both comprehensively and fairly. It retains the traditional metric categories like performance, storage, and compute efficiency, and it adds a new stability metric called degradation rate. This new metric specifically addresses the impact of accumulating small files in the data lake—a common issue arising from frequent, small updates—providing an assessment of the system’s efficiency over time.

The degradation rate is calculated by dividing a workload into different phases. The degradation rate \(S_{DR}\) is defined as follows:

\(S_{DR}={1\over n}\sum\limits_{i=1}^n\dfrac{M_{i} – M_{i-1}}{M_{i-1}}\)

Here, \(M_i\) represents the performance or efficiency metric value of the \(i^{th}\) iteration of a workload phase, and \(n\) reflects the total number of iterations of that phase. Intuitively, \(S_{DR}\) is the rate at which a metric grows or shrinks, reflecting cumulative effects of changes in the underlying system’s state. This rate provides insight into how quickly a system degrades over time. A stable system demonstrates a low \(S_{DR}\), indicating minimal degradation.

LST-Bench implementation

The LST-Bench features a Java-based client application that runs SQL workloads on various engines, enabling users to define tasks, sessions, and phase libraries to reuse different workload components. This allows them to reference these libraries in their workload definitions, add new task templates, or create entirely new task libraries to model-specific scenarios.

LST-Bench also includes a processing module that consolidates experimental results and calculates metrics to provide insights into table formats and engines. It uses both internal telemetry from LST-Bench and external telemetry from cloud services, such as resource utilization, storage API calls, and network I/O volume. The metrics processor offers multiple visualization options, including notebooks and a web app, to help users analyze performance data effectively.

An illustration depicting the components and execution model of the LST-Bench tool. The Client Application establishes connections with engines via dedicated drivers, while the Metrics Processor gathers telemetry from the Client Application, engines, and other cloud services. This data is aggregated and visualized using either a notebook or web application. — Figure 3. The LST-Bench tool components and execution model.

Implications and looking ahead

LST-Bench integrates seamlessly into the testing workflows of the Microsoft Fabric (opens in new tab) warehouse, allowing that team to rigorously assess engine performance, evaluate releases, and identify any issues. This leads to a more reliable and optimized user experience on the Microsoft Fabric data analytics platform. Additionally, LST-Bench holds promise as a foundational tool for various Microsoft initiatives. It’s currently instrumental in research projects focused on improving data organization for table formats, with the goal of increasing the performance of customer workloads on Microsoft Fabric. LST-Bench is also being used to evaluate the performance of table formats converted using Apache XTable (Incubating) (opens in new tab), an open-source tool designed to prevent data silos within data lakes.

LST-Bench is open source (opens in new tab), and we welcome contributors to help expand this tool, making it highly effective for organizations to thoroughly evaluate their table formats.

Acknowledgements

We would like to thank Joyce Cahoon (opens in new tab) and Yiwen Zhu (opens in new tab) for their valuable discussions on the stability metric, and Jose Medrano (opens in new tab) and Emma Rose Wirshing (opens in new tab) for their feedback on LST-Bench and their work on integrating it with the Microsoft Fabric Warehouse.

The post LST-Bench: A new benchmark tool for open table formats in the data lake appeared first on Microsoft Research.

Flexible and adaptive testing

A degradation rate metric to measure stability

LST-Bench implementation

Implications and looking ahead

Microsoft at FAccT 2024: Advancing responsible AI research and practice

Leave a Reply Cancel reply