ICML2025

From Weight-Based to State-Based Fine-Tuning: Further Memory Reduction on LoRA with Parallel Control

Chi Zhang, Lianhai Ren, Jingpu Cheng, Qianxiao Li

摘要

The LoRA method has achieved notable success in reducing GPU memory usage by applying lowrank updates to weight matrices. Yet, one simple question remains: can we push this reduction even further? Furthermore, is it possible to achieve this while reducing computation time and preserving performance? Answering these questions requires moving beyond the conventional weight-centric approach. In this paper, we present a state-based fine-tuning framework that shifts the focus from weight adaptation to optimizing forward states, with LoRA acting as a special example. Specifically, state-based tuning introduces parameterized perturbations to the states within the computational graph, allowing us to control states across an entire residual block. A key advantage of this approach is the potential to avoid storing large intermediate states in models like transformers. Empirical results across multiple architectures-including ViT, RoBERTa, LLaMA2-7B, and LLaMA3-8B-show that our method further reduces memory consumption and computation time while preserving performance. As a result of memory reduction, we explore the feasibility to train 7B/8B models on consumer-level GPUs like Nvidia 3090, without model quantization. The code is available here.