ASE2025

Detecting and Repairing Incomplete Software Requirements with Multi-LLM Ensembles

Mohamad Kassab, Marwan AbdElhameed

摘要

Ensuring complete software requirements specifications (SRS) is critical to preventing costly downstream errors. We present a tool that ensembles three complementary LLMs—DeepSeek Chat, GPT-4o Mini, and Claude Sonnet 4—to detect and suggest remedies for missing requirements. The tool generates a structured domain model and applies parallel external and internal completeness checks through tailored prompts. Users can select LLMs and aggregation methods (majority, weighted, or meta-fusion). Unlike prior single-model approaches, we systematically evaluate aggregation strategies across four diverse SRS domains. In experiments with seeded omissions, single models achieved only 0–52% recall, whereas our ensemble consistently exceeded 75%—reaching up to 100%—with 95–100% plausibility. These results demonstrate the feasibility of multi-LLM ensembles as practical aids—complementing rather than replacing human analysts—and supporting interactive refinement workflows.