EMNLP2025

STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models

Kai Chen, Zihao He, Taiwei Shi, Kristina Lerman

1 citation

Abstract

Steerability, or the ability of large language models (LLMs) to adapt outputs to align with diverse community-specific norms, perspectives, and communication styles, is critical for real-world applications but remains underevaluated. We introduce STEER-BENCH, a benchmark for assessing population-specific steering using contrasting Reddit communities. Covering 30 contrasting subreddit pairs across 19 domains, STEER-BENCH includes over 10,000 instruction-response pairs and validated 5,500 multiple-choice questions with corresponding silver labels to test alignment with diverse community norms. It systematically assesses how effectively LLMs understand community-specific instructions, their resilience to adversarial steering attempts, and their ability to accurately represent diverse cultural and ideological perspectives. Our evaluation of 13 popular LLMs using STEER-BENCH reveals that while human experts achieve an accuracy of 81% with silver labels, the bestperforming models reach only around 65% accuracy depending on the domain and configuration. Some models lag behind humanlevel alignment by over 15 percentage points, highlighting significant gaps in communitysensitive steerability. 1 Steered LLM Vanilla LLM Steered LLM Instruction: Why might some users decide to switch to Linux? Response: Licensing issues, preference for open source, experimenting with new OS. Instruction: Why might some users decide to switch to Linux? Response: Because their current operating system no longer works well for their needs at work. What motivates some Windows users to try Linux? A. Curiosity about open-source software. B. Frustration with Windows updates. C. Influence from tech industry trends. D. Desire to explore new GUI options.