VLDB2025

DobLIX: A Dual-Objective Learned Index for Log-Structured Merge Trees

Alireza Heidari, Amirhossein Ahmadi, Wei Zhang

4 citations

Abstract

In this paper, we introduce DobLIX, a dual-objective learned index (LI) specifically designed for Log-Structured Merge (LSM) tree-based key-value stores. Traditional LIs primarily focus on optimizing index lookups, often overlooking the critical role of data access from storage, which can become a significant performance bottleneck. In LSM-based systems, a considerable portion of the index is stored on disk, making lookups highly dependent on the efficient coordination between in-memory structures and disk-resident data. Poorly optimized access patterns can lead to excessive I/O operations, negatively impacting read latency and overall system performance. DobLIX addresses this by incorporating a second objective, data access optimization, into the LI training process. This dual-objective approach ensures that both index lookup efficiency and data access costs are minimized, leading to significant improvements in read performance while maintaining write efficiency in real-world LSM systems. Additionally, DobLIX features a reinforcement learning agent that dynamically tunes the system parameters, allowing it to adapt to varying workloads in real-time. Experimental results using real-world datasets demonstrate that DobLIX reduces indexing overhead and improves throughput by 1.19x to 2.21x compared to state-of-the-art methods within RocksDB, a widely used LSM-based storage engine.