ACL2025

Detection of Human and Machine-Authored Fake News in Urdu

Muhammad Zain Ali, Yuxia Wang, Bernhard Pfahringer, Tony C. Smith

摘要

Fake news presents misleading information as legitimate news to influence public opinion and deceive readers. Fake news detection techniques distinguish between fake news and real news, having credible information. These techniques analyze the linguistic patterns in the text, contextual inconsistencies in user responses, and propagation behavior on social networks. Unlike high-resource languages, Urdu has limited basic tools that restrict the application of state-of-the-art machine learning models for Urdu-based challenges. Therefore, the available approaches for fake news detection in Urdu do not perform well on benchmark datasets. Bag-of-words approaches consisting of frequency-based sparse vectors are often used to represent features as n-grams, which are inadequate for detecting linguistic indicators related to legitimacy in news. In this paper, we propose a methodology that uses Urduhack text preprocessing tools to prepare the data, Urdu embeddings to represent the news text as dense vectors, and finally, a long short-term memory (LSTM) based deep sequence model to classify fake and real news. The proposed methodology outperforms traditional machine learning approaches in identifying linguistic characteristics and utilizing them for decision-making, achieving considerable performance gains with an accuracy of 85% and 83% on the Bend the Truth (BET) and Urdu fake news (UFN) datasets.