SIGMOD2025
From Suspicious Errors to Valid Data: On Repairing Spatio-Temporal Data via Spatial and Temporal Dependencies
Weiwei Deng, Yu Sun, Shaoxu Song, Xiaojie Yuan
摘要
Spatio-temporal data collected from geographically distributed sources often contain dirty values that affect downstream applications. Temporal data repairing methods, e.g., based on speed constraints, may mistakenly treat sudden changes as errors, although they represent real events and occur simultaneously at multiple locations. Spatial data repairing approaches emphasize value consistency across different locations but ignore temporal pattern similarity. Meanwhile, existing spatio-temporal repairing methods focus more on spatial error correction rather than temporal value repairing across locations. Therefore, we use both temporal and spatial dependencies to identify and repair spatio-temporal errors. Our main contributions are: (1) formalizing the optimal spatio-temporal data repairing problem under constraints and proving its NP-hardness; (2) designing an exact algorithm that decomposes global repair into local decisions with pruning methods; (3) developing two approximate algorithms with theoretical guarantees and probabilities of hitting the optimal solution, where the first explores a wider search space for higher accuracy, and the second uses a greedy sliding-window strategy to improve efficiency; and (4) conducting experiments on nine real-world datasets and downstream applications against eleven baselines, which demonstrate the superiority and practicability of our methods.