ACL2021
CMTA: COVID-19 Misinformation Multilingual Analysis on Twitter
Raj Ratn Pranesh, Mehrdad Farokhenajd, Ambesh Shekhar, Genoveva Vargas-Solar
Abstract
In the current scenario of the coronavirus disease pandemic , the Internet has become an important source of health information for users worldwide. During pandemic situations, myths, sensationalism, rumors and misinformation, generated intentionally or unintentionally, spread rapidly through social networks. Twitter is one of this popular social networks people use to share COVID-19 related news, information, and thoughts that reflect their perception and opinion about the pandemic. Analysis of tweets for identifying misinformation can generate valuable insight to evaluate the quality and readability of online information about the COVID-19. This paper presents a multilingual COVID-19 related tweet analysis method, CMTA, that uses BERT, a deep learning model for multilingual tweet misinformation detection and classification. CMTA extracts features from multilingual textual data, which is then categorized into specific information classes. Classification is done by a Dense-CNN model trained on tweets manually annotated into information classes (i.e., 'false', 'partly false', 'misleading'). The paper assesses CMTA experimenting an analysis of multilingual tweets from February to June, showing the distribution type of information spread across different languages.