WWW2026

MASI: Memory-Adaptive Inference Framework for Spiking Neural Networks on Edge Devices

Di Yu, Helin Zheng, Changze Lv, Xin Du, Linshan Jiang, Xiang Liu, Gang Pan, Shuiguang Deng

摘要

The rapid development of the Internet of Things (IoT) applications necessitates resource-efficient computing paradigms that can unify heterogeneous sensing modalities. Spiking Neural Networks (SNNs) meet this need with their event-driven and energy-efficient processing nature. However, deploying SNNs on mobile and embedded platforms is hindered by strict and fluctuating memory budgets. While prior work explores lightweight model design and system-level memory management, these methods either sacrifice accuracy or incur high runtime overhead due to timestep-dependent dynamics. To tackle these challenges, we propose a memory-adaptive framework MASI that enables efficient on-device SNN inference by combining (1) a fine-grained memory-adaptive layer slicing strategy, (2) a timestep-agnostic scheduler that maximizes memory utilization with minimal fragmentation, and (3) a timestep-aware early-exit mechanism that reduces redundant calculations. Evaluated on diverse workloads and edge devices, MASI can dynamically adapt to runtime memory availability, approximately reducing memory usage by 20.67% and inference latency by 58.53% on average with negligible accuracy loss compared to other feasible on-device implementations under memory constraints.