SIGMOD2025

The Best of Both Worlds: On Repairing Timestamps and Attribute Values for Multivariate Time Series

Jingyu Zhu, Weiwei Deng, Yu Sun, Shaoxu Song, Haiwei Zhang, Xiaojie Yuan

Abstract

Dirty data are often observed in the multivariate time series, which not only degrades data quality but also adversely affects various downstream applications. Existing studies typically focus on repairing such errors appearing in either timestamps or attribute values alone, relying on the assumption that the other part is clean. However, in real scenarios, owing to various reasons, both timestamps and attribute values can be erroneous. It is intuitive to repair timestamps and attribute values respectively by calling different methods in turn. However, such a strategy may lead to over-repairing and introduce additional errors, by ignoring the mutual reference between timestamps and attribute values. Therefore, in this study, rather than repairing timestamps and attribute values respectively by calling different methods in turn, we consider the repairing for both attribute values and timestamps simultaneously. Our major contributions include (1) defining the multivariate speed constraints and formalizing the optimal repair problem with the NP-hardness analysis, (2) computing the exact solutions with pruning strategies and correctness ensurance, (3) designing the quadratic time approximation algorithm with the performance guarantee, (4) devising the linear time algorithm and ensuring its approximation performance bound. Empirical results over real-world dirty datasets demonstrate the superiority and practicality of our algorithms, against eleven competing methods, where our algorithm not only achieves the best accuracy but also spends the lowest time cost.