SIGMOD2025
Cleaning Time Series under Seasonal and Trend Constraints
Zijie Chen, Aoqian Zhang, Shaoxu Song
Abstract
Time series data are often found to be dirty, e.g., with anomalies or sensor failures. Such dirty data obviously hinder the downstream analysis tasks such as forecasting, clustering or classification. Simply discarding the potentially dirty data points is not an option, making the time series incomplete and incompatible to machine learning models. While many time series data cleaning techniques have been developed in the last decade, e.g., with the help of constraints on value fluctuation, the seasonal features are surprisingly ignored. In this paper, we propose to clean time series by first capturing seasonal and trend constraints, and then enforcing them for cleaning. Unfortunately, directly applying existing seasonal-trend decomposition methods is found imprecise (itself affected by errors) and incomplete (not computed at the beginning or end of the series). Moreover, unlike efficient cleaning with simple value fluctuation constraints, the time series cleaning problem with seasonal and trend constraints is proved to be NP-complete. In this sense, we first improve seasonal and trend filter with tolerance to errors and extension on two directions. Then, an efficient heuristic is designed to iteratively repair the time series and refine the seasonal and trend constraints. The approach has now become a built-in function in a product system Apache IoTDB. Experiments on real-world datasets demonstrate the superiority of our proposal in cleaning seasonal time series and improving downstream applications.