ACL2025

Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs

Fakhraddin Alwajih, Abdellah El Mekki, Samar Mohamed Magdy, AbdelRahim A. Elmadany, Omer Nacar, El Moatez Billah Nagoudi, Reem Abdel-Salam, Hanin Atwany, Youssef Nafea, Abdulfattah Mohammed Yahya, Rahaf Alhamouri, Hamzah A. Alsayadi, Hiba Zayed, Sara Shatnawi, Serry Sibaee, Yasir Ech-Chammakhy, Walid Al-Dhabyani, Marwa Mohamed Ali, Imen Jarraya, Ahmed Oumar El-Shangiti, Aisha Alraeesi, Mohammed Anwar Al-Ghrawi, Abdulrahman S. Al-Batati, Elgizouli Mohamed, Noha Taha Elgindi, Muhammed Saeed, Houdaifa Atou, Issam Ait Yahia, Abdelhak Bouayad, Mohammed Machrouh, Amal Makouar, Dania Alkawi, Mukhtar Mohamed, Safaa Taher Abdelfadil, Amine Ziad Ounnoughene, Rouabhia Anfel, Rwaa Assi, Ahmed Sorkatti, Mohamedou Cheikh Tourad, Anis Koubaa, Ismail Berrada, Mustafa Jarrar, Shady Shehata, Muhammad Abdul-Mageed

出版方

摘要

As large language models (LLMs) become increasingly integrated into daily life, ensuring their cultural sensitivity and inclusivity is paramount. We introduce PALM, a year-long community-driven project covering all 22 Arab countries. The dataset includes instructions (input, response pairs) in both Modern Standard Arabic (MSA) and dialectal Arabic (DA), spanning 20 diverse topics. Built by a team of 44 researchers across the Arab world, all of whom are authors of this paper, PALM offers a broad, inclusive perspective. We use PALM to evaluate the cultural and dialectal capabilities of several frontier LLMs, revealing notable limitations. For instance, while closedsource LLMs generally exhibit strong performance, they are not without flaws, and smaller open-source models face greater challenges. Moreover, certain countries (e.g., Egypt, the UAE) appear better represented than others (e.g., Iraq, Mauritania, Yemen). Our annotation guidelines, code, and data for reproducibility are publicly available. More information about PALM is available at our project page: https://github.com/UBC-NLP/palm . LLMs have been pre-trained for Arabic, including Jasmine (Billah Nagoudi et al., 2023), JAIS (Sengupta et al., 2023), AceGPT (Huang et al., 2024), ALLAM (Bari et al., 2024), Fanar (Team et al., 2025), and NileChat ( El Mekki et al., 2025) . These models demonstrate powerful capabilities in generating Arabic across its different forms. However, when it comes to instruction tuning, the datasets used for some of these models (such as JAIS and AceGPT) are predominantly machine-generated or machine-translated, resulting in a set of instructions that are not related to Arab culture. In addition, most of these models lack evaluation on Arabic country-specific cultural awareness for all Arab countries, as most of them were evaluated on general NLP tasks but lack evaluation on specific Arabic countries' cultures and dialects. Our work addresses this need by providing a large dataset of Arabic instructions to ensure better cultural representation of Arab communities. More specifically, we introduce PALM, the first comprehensive fully human-created Arabic instruction dataset that is both culturally and linguistically diverse and inclusive. PALM is the first dataset at the country level to cover all 22 Arab countries, spanning 20 culturally relevant topics. What sets PALM apart is its inclusion of instructions in both MSA and local dialects, all of which are human-annotated using reliable, country-specific sources. This dataset was developed through a large community-driven project, leveraging local expertise and collective knowledge. PALM serves a dual purpose: it can be used for cultural and dialectal instruction tuning of LLMs, as well as for evaluating their cultural competence regarding the Arab world. We offer the following contributions: 1. We present PALM, a novel dataset developed