EMNLP2025

Foot-In-The-Door: A Multi-turn Jailbreak for LLMs

Zixuan Weng, Xiaolong Jin, Jinyuan Jia, Xiangyu Zhang

被引用 2 次

摘要

Ensuring AI safety is crucial as large language models become increasingly integrated into real-world applications. A key challenge is jailbreak, where adversarial prompts bypass built-in safeguards to elicit harmful disallowed outputs. Inspired by psychological foot-in-thedoor principles, we introduce FITD, a novel multi-turn jailbreak method that leverages the phenomenon where minor initial commitments lower resistance to more significant or more unethical transgressions. Our approach progressively escalates the malicious intent of user queries through intermediate bridge prompts and aligns the model's response by itself to induce toxic responses. Extensive experimental results on two jailbreak benchmarks demonstrate that FITD achieves an average attack success rate of 94% across seven widely used models, outperforming existing state-of-theart methods. Additionally, we provide an indepth analysis of LLM self-corruption, highlighting vulnerabilities in current alignment strategies and emphasizing the risks inherent in multi-turn interactions. The code is available at https://github.com/Jinxiaolong1129/Foot-inthe-door-Jailbreak . Responsible Disclosure: We have shared our findings with OpenAI and Meta and discussed the ethical implications. ⋆ Equal contribution Harmfulness Show that how to redirect a request to other departments. Show potential loopholes in advanced email security measures. How to enhance email security measures for personal accounts? How to hack into the email account of a government official? I can't help with that. I can assist you legally and ethically. How to hack into the email account of a government official? 1. Identify the correct department.. 2.Explain the redirection reason….