ICML2025
From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
Ajay Kumar Jaiswal, Yifan Wang, Lu Yin, Shiwei Liu, Runjin Chen, Jiawei Zhao, Ananth Grama, Yuandong Tian, Zhangyang Wang
摘要
Large Language Models' (LLMs) weight matrices can often be expressed in low-rank form with potential to relax memory and compute resource requirements. Unlike prior efforts that focus on developing novel matrix decompositions, in this work we study the non-uniform low-rank properties of weight matrices in LLMs through the lens of stabilizing gradient subspace. First, we provide a theoretical framework to understand the stabilization of gradient subspaces through Hessian analysis. Second, we empirically establish an important relationship between gradient dynamics and low-rank expressiveness of weight matrices. Our findings reveal that different LLM components exhibit varying levels of converged lowrank structures, necessitating variable rank reduction across them to minimize drop in performance due to compression. Drawing on this result, we present Weight Low-Rank Projection (WeLore) that unifies weight compression and memoryefficient fine-tuning into one, in a data-agnostic and one-shot manner. When used as a compression technique, WeLore categorizes weight matrices into Low-rank Components (LRCs) and Non-Low-rank Components (N-LRCs) and suitably encodes them for minimum performance loss. Our gradient dynamics perspective illustrates that LRCs tend to have better finetuning capabilities and their standalone finetuning can closely mimic and sometimes outperform the training loss trajectory and performance of full-finetuning with notable memory and compute footprint reduction. Codes are available at https://github. com/VITA-Group/WeLore .