EMNLP2024

Leveraging Conflicts in Social Media Posts: Unintended Offense Dataset

Che-Wei Tsai, Yen-Hao Huang, Tsu-Keng Liao, Didier Estrada, Retnani Latifah, Yi-Shin Chen

摘要

In multi-person communications, conflicts often arise. Each individual may have their own perspective, which can differ. Additionally, commonly referenced offensive datasets frequently neglect contextual information and are primarily constructed with a focus on intended offenses. This study suggests that conflicts are pivotal in revealing a broader range of human interactions, including instances of unintended offensive language. This paper proposes a conflict-based data collection method to utilize inter-conflict cues in multi-person communications. By focusing on specific cue posts within conversation threads, our proposed approach effectively identifies relevant instances for analysis. Detailed analyses are provided to showcase the proposed approach, efficiently gathers data on subtly offensive content. The experimental results indicate that incorporating elements of conflict into data collection not only significantly enhances the comprehensiveness and accuracy of detecting offensive language but also enriches our understanding of conflict dynamics in digital communication.