ICLR2025

Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models

Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo, Zhao Song, Han Liu

Abstract

We study the computational limits of Low-Rank Adaptation (LoRA) for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior of efficiency assuming the Strong Exponential Time Hypothesis (SETH), and (ii) prove the existence of almost linear algorithms by controlling the LoRA update computation term by term. For the former, we identify a sharp transition in the efficiency of all possible rank-r LoRA update algorithms for transformers, based on specific norms resulting from the multiplications of the input sequence X, pretrained weights W ⋆ , and adapter matrices αBA/r. Specifically, we derive a shared upper bound threshold for such norms, and show that efficient (sub-quadratic) approximation algorithms of LoRA exist only below this threshold. For the latter, we prove the existence of almost linear approximation algorithms for LoRA adaptation by utilizing the hierarchical low-rank structures of LoRA gradients and approximating the gradients with a series of chained low-rank approximations. To showcase our theory, we consider two practical scenarios: partial (e.g., only W V and W Q ) and full adaptations (e.g., W Q , W V , and W K ) of weights in attention heads. * Code is available on OpenReview; full version and future updates are on arXiv. Published as a conference paper at ICLR 2025 The hardness of LoRA's forward pass is trivially characterized by (Alman and Song, 2023). To see this, let X ∈ R L×d be input with length L, and W K , W Q , W V ∈ R d×d be attention weights, and with the inverse temperature β > 0 and Here, exp(•) is entry-wise exponential function, diag (•) converts a vector into a diagonal matrix with the entries of the vector, and 1 L is the length-L all ones vector. LoRA finetuning is given as Definition 1.1 (LoRA (Hu et al., 2021) ). Let W ∈ R b×a be any weight matrix in a pretrained model F , LoRA fine-tunes F through updating W with a low-rank decomposition W = W ⋆ + α r BA. Here, W ⋆ is the frozen pretrained weight. Only B ∈ R b×r and A ∈ R r×a are learnable (being update via gradient descent) with rank r < min(a, b) and tunable hyperparameter α ∈ R. Under the Strong Exponential Time Hypothesis (Hypothesis 1), Alman and Song (2023) state: Lemma 1.1 (Informal, (Alman and Song, 2023)). Fast (sub-quadratic) forward pass of transformer only exist when entries of K, Q, V are bounded by a constant B = Θ( √ log L). for some ϵ > 0, where ∥Z∥ ∞ := max i,j |Z ij |.