EMNLP2025
Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments
Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma
被引用 1 次
摘要
As AI deployment shifts to edge devices, efficient sequence modeling becomes critical.State-space models (SSMs), particularly Mamba, rival Transformers with linear-time complexity and strong performance across tasks, yet their large parameter counts hinder resource-constrained use.We propose a novel unstructured pruning framework tailored for Mamba, achieving up to 70% parameter reduction with only 3-9% performance loss.Unlike Transformer-focused pruning, our approach leverages Mamba's recurrent dynamics through: (1) pruning based on weight and gradient importance to preserve critical parameters, (2) a gradual pruning process to ensure model stability, and (3) a global strategy optimizing parameter allocation across the model.Extensive experiments on WikiText-103, Long Range Arena, and ETT benchmarks show significant efficiency gains, with 1.77 faster inference and 46% less memory.Our component analysis reveals Mamba's robustness, enabling practical deployment while requiring careful use to avoid biases in sensitive applications.ronments.State-space models (SSMs) (Gu et al., 2020a(Gu et al., , 2021;; Gupta et al., 2022) offer a promising alternative with linear-time complexity while effectively modeling long-range dependencies.The Mamba architecture (Gu and Dao, 2023) distinguishes itself through its selective mechanism that dynamically controls information flow based * Equal contribution.