WWW2026

Rethinking Implicit Hate Speech Detection: Focusing on Latent Hate Components via Dual-Process Argumentation

Shiqi Sun, Du Su, Wei Chen, Xueqi Cheng

Abstract

Implicit hate speech often hides harmful intent behind innocuous wording, metaphors, or hostile tone, making it difficult for detectors that rely on surface cues. We observe that large language models (LLMs) frequently exhibit pseudo-reasoning that shows over-sensitivity to spurious cues while missing the latent semantic units that actually realize hateful intent. We call these units Latent Hate Components (LHCs) and argue they should be the anchors of inference. We propose DuPL, a Dual-Process argumentation framework that centers detection on LHCs. DuPL separates (i) Mining of LHCs via a high-recall Critical Miner and a Confusion Judge that early-exits on clear cases, from (ii) Deliberation of LHCs via component-wise argumentation and a final Integrative Decision. Across IHC, SBIC, and ToxiGen, DuPL consistently outperforms prompt-learning baselines, improving accuracy by +8.36, +4.93, and +3.78 percentage points and macro-F1 by +7.18, +5.00, and +3.99 percentage points, respectively. DuPL also lowers both false positive rate and false negative rate in most settings, indicating balanced mitigation of the two failure modes common in LLM detectors. By explicitly mining and deliberating over LHCs, DuPL turns opaque, uncontrolled reasoning into structured argumentation, yielding more accurate and interpretable decisions for web moderation.