AAAI2024
Wavelet-Driven Spatiotemporal Predictive Learning: Bridging Frequency and Time Variations
Xuesong Nie, Yunfeng Yan, Siyuan Li, Cheng Tan, Xi Chen, Haoyuan Jin, Zhihang Zhu, Stan Z. Li, Donglian Qi
31 citations
Abstract
Spatiotemporal predictive learning is a paradigm that empowers models to learn spatial and temporal patterns by predicting future frames from past frames in an unsupervised manner. This method typically uses recurrent units to capture long-term dependencies, but these units often come with high computational costs and limited performance in real-world scenes. This paper presents an innovative Wavelet-based SpatioTemporal (WaST) framework, which extracts and adaptively controls both low and high-frequency components at image and feature levels via 3D discrete wavelet transform for faster processing while maintaining high-quality predictions. We propose a Time-Frequency Aware Translator uniquely crafted to efficiently learn short- and long-range spatiotemporal information by individually modeling spatial frequency and temporal variations. Meanwhile, we design a wavelet-domain High-Frequency Focal Loss that effectively supervises high-frequency variations. Extensive experiments across various real-world scenarios, such as driving scene prediction, traffic flow prediction, human motion capture, and weather forecasting, demonstrate that our proposed WaST achieves state-of-the-art performance over various spatiotemporal prediction methods.