ICLR2025
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
Jingyu Zhang, Ahmed Elgohary, Ahmed Magooda, Daniel Khashabi, Benjamin Van Durme
摘要
Example Configs -Game Development Firm (excerpt) We are a game development firm specializing in a broad range of games ...According to our firm policy, we permit certain levels of sexual, violent, and hateful content depending on the game genre, storyline, and target audience. Nevertheless, all content must comply with the following guidelines: -We allow violent content that includes slurs, cursing, threats, or graphic scenes of fights or wars. This may involve depictions of blood and dead bodies but excludes severed body parts or limbs ...