ACL2024

A Survey on Predicting the Factuality and the Bias of News Media

Preslav Nakov, Jisun An, Haewoon Kwak, Muhammad Arslan Manzoor, Zain Muhammad Mujahid, Husrev T. Sencar

Abstract

The present level of proliferation of fake, biased, and propagandistic content online has made it impossible to fact-check every single suspicious claim or article, either manually or automatically. Thus, many researchers are shifting their attention to higher granularity, aiming to profile entire news outlets, which makes it possible to detect likely "fake news" the moment it is published, by simply checking the reliability of its source. Source factuality is also an important element of systems for automatic fact-checking and "fake news" detection, as they need to assess the reliability of the evidence they retrieve online. Political bias detection, which in the Western political landscape is about predicting left-center-right bias, is an equally important topic, which has experienced a similar shift towards profiling entire news outlets. Moreover, there is a clear connection between the two, as highly biased media are less likely to be factual; yet, the two problems have been addressed separately. In this survey, we review the state of the art on media profiling for factuality and bias, arguing for the need to model them jointly. We further discuss interesting recent advances in using different information sources and modalities, which go beyond the text of the articles the target news outlet has published. Finally, we discuss current challenges and outline future research directions. 044 The issue became a general concern in 2016, 045 a year marked by micro-targeted online disinfor-046 mation at an unprecedented scale in connection 047 to Brexit and the US Presidential election. These 048 developments gave rise to the term "fake news." 049 In an attempt to solve the trust problem, several 050 initiatives, such as PolitiFact, Snopes, FactCheck, 051 and Full Fact, have been launched to fact-check 052 suspicious claims manually. However, given the 053 scale of the proliferation of false information on-054 line, it was unfeasible to fact-check every single 055 suspicious claim, even when this was done automat-056 ically, not only for computational reasons but also 057 due to timing. In order to fact-check a claim man-058 ually or automatically, it is required to verify the 059 stance of mainstream media concerning that claim 060 and/or the reaction of users on social media. Accu-061 mulating this evidence takes time, and delay means 062 more potential sharing of the malicious content. A 063 study has shown that, for some very viral claims, 064 more than 50% of the sharing happens within the 065 first ten minutes after posting the micro-post on 066 social media (Zaman et al., 2014), and thus timing 067 is of utmost importance. Moreover, an extensive 068 recent study has found that "fake news" spreads 069 six times faster and reaches much farther than real 070 news (Vosoughi et al., 2018). 071 A much more promising alternative is to profile 072 the medium that initially published the news article 073 with a suspicious claim. Since media that have pub-074 lished fake or biased content in the past are more 075 likely to do so in the future, profiling media in ad-076 vance makes it possible to detect likely "fake news" 077 the moment it is published by simply checking the 078 reliability of its source. 079 Estimating the reliability of a news source is 080 important for claim fact-checking (Nguyen et al., 081 2018), and it also gives an important prior when 082 solving article-level tasks such as "fake news" 083 1 and click-bait detection (Hardalov et al., 2016; 084