EMNLP2024

Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models

Somanshu Singla, Zhen Wang, Tianyang Liu, Abdullah Ashfaq, Zhiting Hu, Eric P. Xing

1 citation

Abstract

Aligning Large Language Models (LLMs) traditionally relies on costly training and human preference annotations.Self-alignment aims to reduce these expenses by aligning models by themselves.To further minimize the cost and enable LLM alignment without any expensive tuning and annotations, we introduce a new tuning-free approach for self-alignment, called Dynamic Rewarding with Prompt Optimization (DRPO).Our approach leverages a search-based optimization framework that allows LLMs to iteratively self-improve and design the best alignment instructions without the need for additional training or human intervention.The core of DRPO is a dynamic rewarding mechanism, which identifies and rectifies model-specific alignment weaknesses, allowing LLMs to adapt efficiently to diverse alignment challenges.Empirical evaluations on eight recent LLMs, both open-and closed-source, reveal that DRPO significantly enhances alignment performance, with base models outperforming their SFT/RLHF-tuned counterparts.Moreover, DRPO's automatically optimized prompts surpass those curated by human experts, further validating the effectiveness of our approach.Our findings highlight the great potential of current LLMs to be adaptively selfaligned through inference-time optimization, complementing existing tuning-based alignment research.