AAAI2025

All You Need Is S P A C E: When Jailbreaking Meets Bias Audit and Reveals What Lies Beneath the Guardrails (Student Abstract)

Arka Dutta, Aman Priyanshu, Ashiqur R. KhudaBukhsh

被引用 2 次

摘要

This paper makes a novel combination of a recently proposed bias audit framework and a recently proposed jailbreaking technique for Llama3. On an audit comprising several disadvantaged groups, our experiments reveal that a jailbroken Llama3 exhibits worrisome antisemitism, racism, misogyny, and homophobia (to list a few) much akin to a broad suite of LLMs that were susceptible to similar biases.