ICML2025
Focus On This, Not That! Steering LLMs with Adaptive Feature Specification
Tom A. Lamb, Adam Davies, Alasdair Paren, Philip Torr, Francesco Pinto
摘要
Motivation q Methodology q Experiments Focus Instruction Tuning (FIT) Ø Paper introduces a new fine-tuning method we term Focus Instruction Tuning (FIT). Ø Goal is to naturally encourage steerability and testtime adaptability in LLMs to user feature specifications. 2 AI Safety LLMs LLM Steering q Motivation q Methodology q Experiments Results: SMNLI Dataset Results under distribution shift (of feature values) Key Takeaways: FIT achieves strong steerability, which is maintained under distribution shift. This demonstrates FIT's generalisation to new contexts with changing feature values.