ACL2024

Biasly: An Expert-Annotated Dataset for Subtle Misogyny Detection and Mitigation

Brooklyn Sheppard, Anna Richter, Allison Cohen, Elizabeth Allyn Smith, Tamara Kneese, Carolyne Pelletier, Ioana Baldini, Yue Dong

摘要

Using novel approaches to dataset develop-001 ment, the Biasly dataset captures the nuance 002 and subtlety of misogyny in ways that are 003 unique within the literature. Built in collab-004 oration with multi-disciplinary experts and an-005 notators themselves, the dataset contains anno-006 tations of movie subtitles, capturing colloquial 007 expressions of misogyny in North American 008 film. The open-source dataset can be used for 009 a range of NLP tasks, including binary and 010 multi-label classification, severity score regres-011 sion, and text generation for rewrites. In this 012 paper, we discuss the methodology used, an-013 alyze the annotations obtained, provide base-014 lines for each task using common NLP algo-015 rithms, and furnish error analyses to give in-016 sight into model behaviour when fine-tuned on 017 the Biasly dataset. 018 Content Warning: To illustrate examples from 019 our dataset, misogynistic language is used 020 which may be offensive or upsetting. 021 1 Introduction and Related Work 022 When using language models (LMs) to perform 023 sensitive, subjective, and socially impactful tasks 024 like misogyny detection, hate speech mitigation, or