ACL2025

Exploring LLMs' Ability to Spontaneously and Conditionally Modify Moral Expressions through Text Manipulation

Candida Maria Greco, Lucio La Cava, Lorenzo Zangari, Andrea Tagarelli

Abstract

Morality serves as the foundation of societal structure, guiding legal systems, shaping cultural values, and influencing individual selfperception. With the rise and pervasiveness of generative AI tools, and particularly Large Language Models (LLMs), concerns arise regarding how these tools capture and potentially alter moral dimensions through machine-generated text manipulation. Based on the Moral Foundation Theory, our work investigates this topic by analyzing the behavior of 12 LLMs among the most widely used Open and uncensored (i.e., "abliterated") models, and leveraging humanannotated datasets used in moral-related analysis. Results have shown varying levels of alteration of moral expressions depending on the type of text modification task and moral-related conditioning prompt.