ACL2024

Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Xuan-Phi Nguyen, Mahani Aljunied, Shafiq Joty, Lidong Bing

摘要

Large language models (LLMs) are known to perform tasks by simply observing few exemplars. Moreover, competent generative capabilities of LLMs are observed mostly in highresource languages, while their performances among under-represented languages fall behind due to pre-training data imbalance. To elicit LLMs' ability onto low-resource languages without any supervised data, we propose to assemble synthetic exemplars from a diverse set of high-resource languages. These prompts can directly induce generative capabilities in lowresource languages and serve as intra-lingual exemplars to even improve tasks in these languages. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 34 Indic and African languages, and surpasses supervised prompting in non-English tasks. The method also significantly improves low-resource performances in many other intra-lingual tasks like summarization (XLSum), question answering (XQUAD & TydiQA) and conversational instruction following (Sea-Bench).