WWW2026

Enhancing Content Moderation with LLMs: A Reddit Case Study on Evaluating and Refining Human Decisions

Jiahui He, Yiluo Wei, Gareth Tyson

Abstract

Large Language Models (LLMs) offer significant potential for assisting with the design and implementation of social platform moderation. This study evaluates their efficacy as both a replacement to and an augmentation for human moderators. Using Reddit as a case study, we first demonstrate that LLMs can effectively replicate human moderation decisions, achieving 83.9% agreement. Through a mix of LLMs and human annotations, we then evaluate real moderator decisions, uncovering substantial error rates: 15.2% of removals and 13% of approvals are estimated as incorrect, primarily stemming from moderators citing the incorrect rules (84.3% of errors). This motivates us to propose RuleSharpener, a tool that uses LLMs to diagnose the root causes of moderation errors (e.g. ambiguous rules) and generate clearer, more actionable guidelines. Our evaluation shows that RuleSharpener increases the accuracy of identifying the specific rules violated by violation posts by 38.0%. Our work demonstrates how LLMs can augment human moderation, refine community policies, and reduce operational burdens, offering a better solution for platform governance on the web.