ACL2025
Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas
Nishant Balepur, Vishakh Padmakumar, Fumeng Yang, Shi Feng, Rachel Rudinger, Jordan Lee Boyd-Graber
Abstract
Sure! To liven up your party, you could… or even hire a bartender to make specialty cocktails… Direct Preference Opt. (DPO) Sure! … Whatever you decide, make sure it's something that everyone can enjoy and stay safe! DPO + Persona Tailoring (Ours) Sure! Here are some ideas to liven up your party tonight: ... 10. Have an icebreaker activity Prompt My school is having a cake drive. Would brownies be okay to take? Chosen Persona The user values simplicity and prefers direct, concise answers without additional details Prompt: My school is having a cake drive… Persona: The user values simplicity and prefers direct… Response: Yes, brownies would be a great… Rejected Persona The user is practical, preferring responses that include logistical considerations Response: Yes, brownies would be a great contribution… Persona Inference ( §2, 3) Persona Tailoring ( §4, 5) Prompt: My school is having a cake… Persona: The user values simplicity… 1) Few-shot Prompting 2) Supervised Fine-Tuning 3) Direct Preference Optimization Typical Preference Dataset Can abductive reasoning reveal why users may prefer responses? Prompt: My school is having… Persona: The user values sim… Chosen Response: Yes, brownies would be a great… Rejected Response: Yes.