ACL2025

Evaluating Credibility and Political Bias in LLMs for News Outlets in Bangladesh

Tabia Tanzin Prama, Md. Saiful Islam

4 citations

Abstract

Large language models (LLMs) are widely used in search engines to provide direct answers, while AI chatbots retrieve updated information from the web. As these systems influence how billions access information, evaluating the credibility of news outlets has become crucial. We audit nine LLMs from OpenAI, Google, and Meta to assess their ability to evaluate the credibility and political bias of the top 20 most popular news outlets in Bangladesh. While most LLMs rate the tested outlets, larger models often refuse to rate sources due to insufficient information, while smaller models are more prone to hallucinations. We create a dataset of credibility ratings and political identities based on journalism experts’ opinions and compare these with LLM responses. We find strong internal consistency in LLM credibility ratings, with an average correlation coefficient ( ρ ) of 0.72, but moderate alignment with expert evaluations, with an average ρ of 0.45. Most LLMs (GPT-4, GPT-4o-mini, Llama 3.3, Llama-3.1-70B, Llama 3.1 8B, and Gemini 1.5 Pro) in their default configurations favor the left-leaning Bangladesh Awami League, giving higher credibility ratings, and show misalignment with human experts. These findings high-light the significant role of LLMs in shaping news and political information.