EMNLP2024

The Greatest Good Benchmark: Measuring LLMs' Alignment with Utilitarian Moral Dilemmas

Giovanni Marraffini, Andrés Cotton, Noe Hsueh, Axel Fridman, Juan Wisznia, Luciano Del Corro

被引用 3 次

摘要

The question of how to make decisions that maximise the well-being of all persons is very relevant to design language models that are beneficial to humanity and free from harm. We introduce the Greatest Good Benchmark (GGB), to evaluate LLMs moral judgments using utilitarian dilemmas. Our framework enables a direct comparison between the moral preferences of LLMs and humans, contributing to a deeper understanding of LLMs' alignment with human moral values. Analyzing 15 diverse models, we uncover consistent moral preferences that diverge from established moral theories and and lay population moral standards. Specifically, most LLMs exhibit a strong inclination toward impartial beneficence and a rejection of instrumental harm. These findings showcase the 'artificial moral compass' of LLMs, offering insights into their moral alignment.