KDD2026

TimeDistill: Efficient Long-Term Time Series Forecasting with MLP via Cross-Architecture Distillation

Juntong Ni, Zewen Liu, Shiyu Wang, Ming Jin, Wei Jin

被引用 19 次

摘要

Transformer-based and CNN-based methods demonstrate strong performance in long-term time series forecasting. However, their high computational and storage requirements can hinder large-scale deployment. To address this limitation, we propose integrating lightweight MLP with advanced architectures using knowledge distillation (KD). Our preliminary study reveals different models can capture complementary patterns, particularly multi-scale and multiperiod patterns in the temporal and frequency domains. Based on this observation, we introduce TimeDistill, a cross-architecture KD framework that transfers these patterns from teacher models (e.g., Transformers, CNNs) to MLP. Additionally, we provide a theoretical analysis, demonstrating that our KD approach can be interpreted as a specialized form of mixup data augmentation. TimeDistill improves MLP performance by up to 18.6%, surpassing teacher models on eight datasets. It also achieves up to 7× faster inference and requires 130× fewer parameters. Furthermore, we conduct extensive evaluations to highlight the versatility and effectiveness of TimeDistill. The code is available at Github Code Repo. CCS Concepts • Information systems → Temporal data; • Mathematics of computing → Time series analysis; • Computing methodologies → Neural networks.