ICML2025

Canonical Rank Adaptation: An Efficient Fine-Tuning Strategy for Vision Transformers

Lokesh Veeramacheneni, Moritz Wolter, Hilde Kuehne, Juergen Gall

摘要

Modern methods for fine-tuning Vision Transformers, such as Low-Rank Adaptation (LoRA) and its variants, demonstrate impressive performance. However, these methods ignore the high-dimensional nature of Multi-Head Attention (MHA) weight tensors. To address this limitation, we propose Canonical Rank Adaptation (CaRA). CaRA leverages tensor mathematics, first by tensorising the transformer into two different tensors: one for projection layers in MHA and the other for feed-forward layers. Second, the tensorised formulation is fine-tuned using the lowrank adaptation in the Canonical-Polyadic Decomposition (CPD) form. Employing CaRA efficiently minimises the number of trainable parameters. Experimentally, CaRA outperforms existing Parameter-Efficient Fine-Tuning (PEFT) methods in visual classification benchmarks such as the Visual Task Adaptation Benchmark (VTAB)-1k and the Fine-Grained Visual Categorization (FGVC) benchmark.