EMNLP2022

Stanceosaurus: Classifying Stance Towards Multicultural Misinformation

Jonathan Zheng, Ashutosh Baheti, Tarek Naous, Wei Xu, Alan Ritter

6 citations

Abstract

We present Stanceosaurus, a new corpus of 28,033 tweets in English, Hindi, and Arabic annotated with stance towards 251 misinformation claims. As far as we are aware, it is the largest corpus annotated with stance towards misinformation claims. The claims in Stanceosaurus originate from 15 fact-checking sources that cover diverse geographical regions and cultures. Unlike existing stance datasets, we introduce a more fine-grained 5class labeling strategy with additional subcategories to distinguish implicit stance. Pretrained transformer-based stance classifiers that are fine-tuned on our corpus show good generalization on unseen claims and regional claims from countries outside the training data. Cross-lingual experiments demonstrate Stanceosaurus' capability of training multilingual models, achieving 53.1 F1 on Hindi and 50.4 F1 on Arabic without any targetlanguage fine-tuning. Finally, we show how a domain adaptation method can be used to improve performance on Stanceosaurus using additional RumourEval-2019 data. We make Stanceosaurus publicly available to the research community and hope it will encourage further work on misinformation identification across languages and cultures. 1 Dataset Target Number/Range of Topics SemEval-2016 (Mohammad et al., 2016) Subject 6 political topics (e.g., atheism, feminist movement) SRQ (Villa-Cox et al., 2020) Subject 4 political topics & events (e.g., general terms, student marches) Catalonia (Zotova et al., 2020) Subject 1 topic (i.e., Catalonia independence) COVID (Glandt et al., 2021) Subject 4 topic related to Covid-19 (e.g., stay at home orders) Multi-target (Sobhani et al., 2017) Entity 3 pairs of candidates in 2016 US election WTWT (Conforti et al., 2020) Event 5 merger and acquisition events RumourEval (Gorrell et al., 2019) Tweet 8 news events + rumors about natural disasters Rumor-has-it (Qazvinian et al., 2011) Claim 5 rumors (e.g., Sarah Palin getting divorced?) CovidLies (Hossain et al., 2020) Claim 86 pieces of COVID-19 misinformation Stanceosaurus (this work) Claim 251 claims over a diverse set of global and regional topics Table 1: Summary of Twitter stance classification datasets. Stanceosaurus covers more claims from a broader range of topics and geographical regions than prior Twitter stance datasets.