VLDB2025
Beyond Compression: A Comprehensive Evaluation of Lossless Floating-Point Compression
Kaisei Hishida, Chunwei Liu, John Paparrizos, Aaron J. Elmore
被引用 8 次
摘要
Modern data-intensive applications generate vast amounts of floating-point data, essential for fields like databases and machine learning. While many compression techniques focus on space efficiency, there is a lack of benchmarks evaluating both compression and query performance, especially in areas like in-situ query execution on compressed data and machine learning tasks such as distance measurement and k-nearest neighbors (k-NN) in Retrieval-Augmented Generation (RAG) systems. This paper addresses this gap by evaluating popular lossless floating-point compression methods on three key factors: compression efficiency, database operations performance, and machine learning query performance. We implemented these techniques in Rust and integrated them into an open-source library for use with columnar engines. Our comparison highlights trade-offs between compression efficiency and query performance, showing that no single approach excels in all areas, and some methods trade off compression for slower performance.