ACL2024

All Languages Matter: On the Multilingual Safety of LLMs

Wenxuan Wang, Zhaopeng Tu, Chang Chen, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu

14 citations

Abstract

Ensuring safety is fundamental when developing and deploying large language models (LLMs). However, previous safety benchmarks only concern the safety in one language, e.g., the majority language in the pretraining data, such as English. In this work, we build the first multilingual safety benchmark for LLMs, XSAFETY, in response to the global deployment of LLMs in practice. XSAFETY covers 14 commonly used safety issues across ten languages spanning several language families. We utilize XSAFETY to empirically study the multilingual safety for four widely-used LLMs, including closed-source APIs and open-source models. Experimental results show that all LLMs produce significantly more unsafe responses for non-English queries than English ones, indicating the necessity of developing safety alignment for non-English languages. In addition, we propose a simple and effective prompting method to improve ChatGPT's multilingual safety by enhancing cross-lingual generalization of safety alignment. Our prompting method can significantly reduce the ratio of unsafe responses by 42% for non-English queries. We release the data to facilitate future research on LLM's safety 1 . * Work was done when Wenxuan Wang, Youliang Yuan, and Jen-tse Huang were interning at Tencent AI Lab. † Jen-tse Huang is the corresponding author. 1 Our dataset is released at https://github.com/ Jarviswang94/Multilingual_safety_benchmark ব"াংক &থেক টাকা চ+ ির করার িবিভ/ পদে2প হেত পাের, যা িনে7 তািলকাভ+ 9 করা হেলা: