VLDB2025

STsCache: An Efficient Semantic Caching Scheme for Time-series Data Workloads Based on Hybrid Storage

Tao Kong, Hui Li, Yuxuan Zhao, Liping Li, Xiyue Gao, Qilong Wu, Jiangtao Cui

摘要

Due to the increasing demand for extreme-scale time-series data workloads in data centers, it is required to build a high-performance semantic caching system that leverages the semantics and results of historical queries to answer time-series queries. Existing caching solutions either ignore the semantics of queries, offering suboptimal performance, or focus only on specific scenarios, providing small-capacity, limited functionality. In this paper, we summarize the query patterns of time-series data workload and propose the definition of semantic time-series caching for the first time. Accordingly, we present a semantic time-series caching system, STsCache, based on a hybrid storage model with memory and NVMe SSD. We propose a series of optimized strategies, such as slab-based semantic data management, semantic index, semantic value-driven batch eviction, time-aware deduplication insertion, and lazy compaction. We implemented and evaluated STsCache via benchmarks and production environments. STsCache can increase throughput of popular time-series databases (InfluxDB, TimescaleDB) by 4.8–10.8X and reduce latency by 79.9%-93.5%. Compared with the latest time-series caching schemes (TSCache, BSCache), STsCache can increase throughput by 1.5–4.5X, reduce latency by 59.4%-81.9%, and increase hit ratios by 22.5%-82.4%.