AAAI2025
All You Need Is S P A C E: When Jailbreaking Meets Bias Audit and Reveals What Lies Beneath the Guardrails (Student Abstract)
Arka Dutta, Aman Priyanshu, Ashiqur R. KhudaBukhsh
被引用 2 次
摘要
This paper makes a novel combination of a recently proposed bias audit framework and a recently proposed jailbreaking technique for Llama3. On an audit comprising several disadvantaged groups, our experiments reveal that a jailbroken Llama3 exhibits worrisome antisemitism, racism, misogyny, and homophobia (to list a few) much akin to a broad suite of LLMs that were susceptible to similar biases.