ACL2025

NewsInterview: a Dataset and a Playground to Evaluate LLMs' Grounding Gap via Informational Interviews

Alexander Spangher, Michael Lu, Sriya Kalyan, Hyundong Justin Cho, Tenghao Huang, Weiyan Shi, Jonathan May

被引用 2 次

摘要

Large Language Models (LLMs) have demonstrated impressive capabilities in generating coherent text but often struggle with strategic dialogue. To address this gap, we focus on journalistic interviews. We curate a dataset of 40,000 two-person informational interviews from major news organizations in scenarios where human interviewers employ strategies to coax information from sources. We then try to mimic these activities with LLMs and find striking differences; models are less likely to use acknowledgments and more likely to rabbit-hole and not pivot to other topics. Real-izing that a fundamental deficit exists in LLM multi-turn planning and strategic thinking, we develop a realistic simulated environment, incorporating source personas and persuasive elements, in order to facilitate the development of agents with long-horizon rewards. Our experiments show that mimicry failures are not two-sided; when posing as a source, models adequately reflect human behavior in information sharing, making our simulation a realistic benchmark. Interviewer-LLMs, however, struggle with engaging persuasively, leading to sub-optimal information extraction across model size and capability. This simulated game lays the groundwork for future work in enhancing LLMs’ strategic dialogue capabilities. 1