WWW2026

Mitigating Cognitive Vulnerabilities in Code Generation via Multi-Agent Adversarial Debate

Shuofu Liu, Quanjiang Guo, Xiao Liu, Ying Liu

被引用 1 次

摘要

Although Large Language Models (LLMs) have demonstrated significant proficiency in code generation, their monolithic and correlation-driven nature renders them susceptible to systematic cognitive biases, deficient counterfactual reasoning, and adversarial manipulation—a characteristic we term cognitive vulnerability. Such vulnerabilities compromise the reliability and security of AI-assisted software development, potentially resulting in code that is not only functionally incorrect but also biased, insecure, and difficult to validate using conventional testing paradigms. While recent multi-agent systems enhance workflow efficiency via task decomposition, they do not fundamentally address these reasoning deficits. This highlights the need for a framework capable of proactively identifying and mitigating the cognitive flaws of an LLM during the reasoning process. To address this challenge, we introduce CodeForge, a multi-agent adversarial reasoning framework that reframes code generation as a cognitive crucible. This process involves a structured debate among three specialized agents—an Optimist, a Pragmatist, and an Adversarial Skeptic—who iteratively cross-examine and refine solution plans. Following convergence, an adversarial verification module systematically generates counterfactual perturbations to stress-test and enhance the final plan. Comprehensive evaluations on the HumanEval and MBPP benchmarks demonstrate that CodeForge significantly outperforms state-of-the-art methods, achieving a pass@1 of 97.3% with GPT-4. Ablation studies confirm the necessity of both the adversarial dialogue and counterfactual verification components. This work represents a shift from passive debugging to proactive cognitive hardening, establishing a pathway toward more trustworthy automated software engineering.