EMNLP2025
Towards AI-Assisted Psychotherapy: Emotion-Guided Generative Interventions
Kilichbek Haydarov, Youssef Mohamed, Emilio Goldenhersch, Paul OCallaghan, Li-jia Li, Mohamed Elhoseiny
Abstract
Large language models (LLMs) hold promise for therapeutic interventions, yet most existing datasets rely solely on text, overlooking nonverbal emotional cues essential to real-world therapy. To address this, we introduce a multimodal dataset of 1,441 publicly sourced therapy session videos containing both dialogue and non-verbal signals such as facial expressions and vocal tone. Inspired by Hochschild's concept of emotional labor, we propose a computational formulation of emotional dissonance-the mismatch between facial and vocal emotion-and use it to guide emotionally aware prompting. Our experiments show that integrating multimodal cues, especially dissonance, improves the quality of generated interventions. We also find that LLM-based evaluators misalign with expert assessments in this domain, highlighting the need for humancentered evaluation.