SIGMOD2025
Rapid Data Ingestion through DB-OS Co-design
Kyungmin Lim, Minseok Yoon, Kihwan Kim, Alan David Fekete, Hyungsoo Jung
Abstract
Sequential data access for the rapid ingestion of large fact tables from storage is a pivotal yet resource-intensive operation in data warehouse systems, consuming substantial CPU cycles across various components of DBMSs and operating systems. Although bypassing these layers can eliminate access latency, concurrent access to the same table often results in redundant data fetching due to cache-bypassing data transfers. Thus, a new design for data access control is necessary to enhance rapid data ingestion in databases. To address this concern, we propose a novel DB-OS co-design that efficiently supports sequential data access at full device speed. Our approach, zicIO, liberates DBMSs from data access control by preparing required data just before DBMSs access it, while alleviating all known I/O latencies. The core of zicIO lies in its DB-OS co-design, which aims to (1) automate data access control and (2) relieve redundant data fetching through seamless collaboration between the DB and the OS. We implemented zicIO and integrated it with four databases to demonstrate its general applicability. The evaluation showed performance enhancements of up to 9.95x under TPC-H loads.