ACL2025

AUTOSUMM: A Comprehensive Framework for LLM-Based Conversation Summarization

Abhinav Gupta, Devendra Singh Chaplot, Greig A. Cowan, N. Kadhiresan, Siddharth Srivastava, Yagneswaran Sriraja, Yoages Kumar Mantri

Abstract

We present AUTOSUMM, a large language model (LLM)-based summarization system deployed in a regulated banking environment to generate accurate, privacy-compliant summaries of customer-advisor conversations. The system addresses challenges unique to this domain, including speaker attribution errors, hallucination risks, and short or low-information transcripts. Our architecture integrates dynamic transcript segmentation, thematic coverage tracking, and a domain specific multilayered hallucination detection module that combines syntactic, semantic, and entailmentbased checks. Human-in-the-loop feedback from over 300 advisors supports continuous refinement and auditability. Empirically, AUTOSUMM achieves a 94% factual consistency rate and a significant reduction in hallucination rate. In production, 89% of summaries required no edits, and only 1% required major corrections. A structured model version management pipeline ensures stable upgrades with minimal disruption. We detail our deployment methodology, monitoring strategy, and ethical safeguards, showing how LLMs can be reliably integrated into high-stakes, regulated workflows. * The authors' names are listed alphabetically, with all contributing equally.