ACL2024

ResLoRA: Identity Residual Mapping in Low-Rank Adaption

Shuhua Shi, Shaohan Huang, Minghui Song, Zhoujun Li, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang

摘要

As one of the most popular parameter-efficient fine-tuning (PEFT) methods, low-rank adaptation (LoRA) is commonly applied to finetune large language models (LLMs). However, updating the weights of LoRA blocks effectively and expeditiously is challenging due to the long calculation path in the original model. To address this, we propose ResLoRA, an improved framework of LoRA. By adding residual paths during training and using merging approaches to eliminate these extra paths during inference, our method can achieve better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA. The experiments on NLG, NLU, and text-to-image tasks demonstrate the effectiveness of our method. To the best of our knowledge, ResLoRA is the first work that combines the residual path with LoRA. The code of our method is available at https://anonymous. 4open.science/r/ResLoRA-E25E . 042 and computation after merging, and has been math-043 ematically proven (Zeng and Lee, 2023) to be ef-044 fective, so it has a wide range of applications. 045 The basic LoRA method still has some limi-046 tations. Previous studies mainly focused on ei-047 ther dynamically adjusting the rank of LoRA mod-048 ules in different layers of the model(Zhang et al., 049 2023a), or using fewer trainable parameters to 050 achieve a similar effect as the original LoRA 051 method(Valipour et al., 2022). However, they over-052 looked a potential problem: a long backward path 053 hinders the updating of parameters in LoRA blocks. 054 As a prominent method, ResNet(He et al., 055 2016a,b) has proven to be widely efficient, and is 056 also used in Transformer models(Vaswani et al., 057 2017), between different encoder and decoder 058 blocks. Parallel to linears in these blocks, LoRA 059 blocks can also benefit from the original shortcut 060 design. However, unlike linears, LoRA blocks are 061 more fine-grained. One LoRA block only corre-062 sponds to one linear, so the original shortcut is not 063 perfectly suitable for LoRA blocks. Let's use en-064 coders of Transformer as an example. If we add 065 2 Related Works 116 Parameter-efficient fine-tuning (PEFT) Research 117 on PEFT can be divided into three types. One line 118