EMNLP2025

Improving Instruct Models for Free: A Study on Partial Adaptation

Ozan Irsoy, Pengxiang Cheng, Jennifer L. Chen, Daniel Preotiuc-Pietro, Shiyue Zhang, Duccio Pappadopulo

Abstract

Instruct models, obtained from various instruction tuning or post-training steps, are commonly deemed superior and more usable than their base counterpart.While the model gains instruction following ability, instruction tuning may lead to forgetting the knowledge from pre-training or it may encourage the model to become overly conversational or verbose.This, in turn, can lead to degradation of in-context few-shot learning performance.In this work, we study the performance trajectory between base and instruct models by scaling down the strength of instruction-tuning via the partial adaption method.We show that, across several model families and model sizes, reducing the strength of instruction-tuning results in material improvement on a few-shot in-context learning benchmark covering a variety of classic natural language tasks.This comes at the cost of losing some degree of instruction following ability as measured by AlpacaEval.Our study shines light on the potential trade-off between in-context learning and instruction following abilities that is worth considering in practice.* Work done while at Bloomberg. Author ordering chosen at random.0.0 0.2 0.4 0.6 0.8 1.0 5 0 5 10 Llama-2 70B Llama-2 7B 0.0 0.2 0.4 0.6 0.8 1.0 5 0 5 10 Llama-3 70B Llama-3 8B 0.0 0.2 0.4 0.6 0.8 1.0 5 0 5 10 Llama-3.1 70B Llama-3.1 8B 0.0 0.2 0.4 0.6 0.8 1.0 5 0 5 10 Llama-3.2 1B Llama-3.2 3B Llama-3.3 70B 0.0 0.2 0.4 0.6 0.8 1.0 5 0 5 10 Mistral 7B v0.3 Mistral 7B v0.1 Mistral Nemo 12B 0.0 0.2 0.4 0.6 0.8 1.0 5 0 5 10 Mixtral 8x22B v0.1 Mixtral 8x7B v0.1 0.0 0.2 0.4 0.6 0.8 1.0 5 0 5 10 OLMo 7B OLMo 2 7B OLMo 2 13B 0.0 0.2 0.4 0.6 0.8 1.0 5 0 5 10 Gemma-2 9B % change from instruct model (M 1 )