ACL2025

Substance over Style: Evaluating Proactive Conversational Coaching Agents

Vidya Srinivas, Xuhai Xu, Xin Liu, Kumar Ayush, Isaac R. Galatzer-Levy, Shwetak N. Patel, Daniel McDuff, Tim Althoff

8 citations

Abstract

While NLP research has made strides in conversational tasks, many approaches focus on single-turn responses with well-defined objectives or evaluation criteria. In contrast, coaching presents unique challenges with initially undefined goals that evolve through multi-turn interactions, subjective evaluation criteria, and mixed-initiative dialogue. In this work, we describe and implement five multi-turn coaching agents that exhibit distinct conversational styles, and evaluate them through a user study, collecting first-person feedback on 155 conversations. We find that users highly value core functionality, and that stylistic components in absence of core components are viewed negatively. By comparing user feedback with thirdperson evaluations from health experts and an LM, we reveal significant misalignment across evaluation approaches. Our findings provide insights into design and evaluation of conversational coaching agents and contribute toward improving human-centered NLP applications. * Work done during an internship at Google There's quite a few factors. I'll get home after work, and after dinner and other responsibilities, I'm super tired. I'm also a DJ and I have work as well.