CVPR2025

MEET: Towards Memory-Efficient Temporal Sparse Deep Neural Networks

Zeqi Zhu, Ibrahim Batuhan Akkaya, Luc Waeijen, Egor Bondarev, Arash Pourtaherian, Orlando Moreira

Abstract

Deep Neural Networks (DNNs) are accurate but computeintensive, leading to substantial energy consumption during inference. Exploiting temporal redundancy through ∆-Σ convolution [26] in video processing has proven to greatly enhance computation efciency. However, temporal ∆-Σ DNNs typically require substantial memory for storing neuron states to compute inter-frame differences, hindering their on-chip deployment. To mitigate this memory cost, directly compressing the states can disrupt the linearity of temporal ∆-Σ convolution, causing accumulated errors in long-term ∆-Σ processing. Thus, we propose MEET, an optimization framework for MEmory-Efcient Temporal ∆-Σ DNNs. MEET transfers the state compression challenge to a well-established weight compression problem by trading fewer activations for more weights and introduces a co-design of network architecture and suppression method to optimize for mixed spatial-temporal execution. Evaluations on three vision applications demonstrate a reduction of 5.1∼13.3 × in total memory compared to the most computation-efcient temporal DNNs, while preserving the computation efciency and model accuracy in long-term ∆-Σ processing. MEET facilitates the deployment of temporal ∆-Σ DNNs within on-chip memory of embedded eventdriven platforms, empowering low-power edge processing.