CVPR2025
MEET: Towards Memory-Efficient Temporal Sparse Deep Neural Networks
Zeqi Zhu, Ibrahim Batuhan Akkaya, Luc Waeijen, Egor Bondarev, Arash Pourtaherian, Orlando Moreira
Abstract
Deep Neural Networks (DNNs) are accurate but computeintensive, leading to substantial energy consumption during inference. Exploiting temporal redundancy through ∆-Σ convolution [26] in video processing has proven to greatly enhance computation efciency. However, temporal ∆-Σ DNNs typically require substantial memory for storing neuron states to compute inter-frame differences, hindering their on-chip deployment. To mitigate this memory cost, directly compressing the states can disrupt the linearity of temporal ∆-Σ convolution, causing accumulated errors in long-term ∆-Σ processing. Thus, we propose MEET, an optimization framework for MEmory-Efcient Temporal ∆-Σ DNNs. MEET transfers the state compression challenge to a well-established weight compression problem by trading fewer activations for more weights and introduces a co-design of network architecture and suppression method to optimize for mixed spatial-temporal execution. Evaluations on three vision applications demonstrate a reduction of 5.1∼13.3 × in total memory compared to the most computation-efcient temporal DNNs, while preserving the computation efciency and model accuracy in long-term ∆-Σ processing. MEET facilitates the deployment of temporal ∆-Σ DNNs within on-chip memory of embedded eventdriven platforms, empowering low-power edge processing.