ACL2024

Learning or Self-aligning? Rethinking Instruction Fine-tuning

Mengjie Ren, Boxi Cao, Hongyu Lin, Cao Liu, Xianpei Han, Ke Zeng, Guanglu Wan, Xunliang Cai, Le Sun

摘要

Instruction Fine-tuning (IFT) is a critical phase in building Large Language Models (LLMs). Previous works mainly focus on the IFT's role in the transfer of behavioral norms and the learning of additional world knowledge. However, the understanding of the underlying mechanisms of IFT remains significantly limited. In this paper, we design a knowledge intervention framework to decouple the potential underlying factors of IFT, thereby enabling individual analysis of different factors. Surprisingly, our experiments reveal that attempting to learn additional world knowledge through IFT often struggles to yield positive impacts and can even lead to markedly negative effects. Further, we discover that maintaining internal knowledge consistency before and after IFT is a critical factor for achieving successful IFT. Our findings reveal the underlying mechanisms of IFT and provide robust support for some very recent and potential future works. 042 contained within the IFT data, thereby enabling 043 more effective parameter-entailed knowledge ex-044 pression (Zhou et al., 2023). On the other hand, 045 many existing studies aim to facilitate domain-046 specific adaptation of LLMs through IFT, by in-047 jecting the world knowledge contained in IFT data 048 into LLMs (Li et al., 2023, Cui et al., 2023). 049 Unfortunately, both the transfer of behavioral 050 norms and the enhancement of domain knowledge 051 are closely coupled with the corpus applied in 052 IFT, rendering the analysis of IFT's true effects 053 exceedingly challenging. Due to the interconnec-054 tion of these two effects, it is challenging to dis-055 cern whether the benefits derived are due to the 056 promotion of more effective expression of parame-057 ter knowledge or the injection of additional world 058 knowledge. The coupling between the above two 059 factors, along with a lack of in-depth analysis of 060 the IFT mechanism, hampers our comprehension 061 of the effectiveness of IFT. This limitation hinders 062 our ability to develop robust strategies for IFT data 063 construction, model training, and model evaluation 064 due to insufficient theoretical support. Therefore, 065 a thorough and comprehensive analysis of the un-066 1 derlying core factors that drive the effectiveness of 067 IFT is crucial for achieving more effective IFT. 068 To this end, this paper designs a knowledge in-069 tervention framework for analyzing the underlying 070 mechanisms of IFT. The main idea of our frame-071 work is to control the consistency between the 072 knowledge in IFT data and the existing parame-073 ter knowledge of LLM, in order to decouple the 074 injection of domain knowledge from the transfer of 075 behavior norms during IFT. This allows for a sepa-076 rate analysis of the roles of these two crucial IFT 077 factors. Specifically, we first employ in-context 078 learning (ICL) (Dong et al., 2023, Brown et al., 079 2020) to probe the internal parameter knowledge 080 of LLMs. Building on this, we intervene in the 081 composition of existing parameter knowledge and 082 the newly introduced world knowledge within IFT 083 data and then observe the differences in model be-084 havior after IFT under different intervention groups. 085 Based on the framework, we conduct an in-depth 086 analysis to answer the following two critical re-087 search questions (RQ): 088 • RQ1: How does the world knowledge in IFT 089 data affect LLMs? 090 • RQ2: What is the underlying cause of the 091 above impact? 092 For RQ1, we initially discover that significant 093 discrepancies between the world knowledge con-094 tained in IFT data and the existing parameter knowl-095 edge within LLMs can substantially undermine the 096 models' capabilities. Performance derived from a 097 set of IFT data that contains incorrect world knowl-098 edge but aligns with the model's parameter knowl-099 edge is significantly better than that from a set con-100 taining correct world knowledge but inconsistent 101 with the model's internal parameter knowledge. To 102 dive into this phenomenon, we explicitly provide 103 LLMs with the necessary world knowledge in the 104 context, enabling the model to only transform the 105 output behavioral norms rather than jointly learn 106 the inconsistent world knowledge. We discover 107 that the detrimental effects caused by the incon-108 sistency between parameter knowledge and world 109 knowledge can be significantly mitigated by explic-110 itly providing such self-contained IFT datapoints.