ASE2025

Uncovering Prompt Elements: Cloning System Prompts from Behavioral Traces

Yi Qian, Fei Peng, Hao Wu, Ligeng Chen, Bing Mao

被引用 1 次

摘要

We introduce prompt cloning, a new black-box attack that reconstructs functionally equivalent system prompts rather than extracts original system prompts. Unlike prompt stealing, prompt cloning exploits the insight that system prompts leave persistent behavioral traces in outputs, even under strong alignment and prompt-level defenses. Our method decomposes system behavior into semantically interpretable elements, selectively elicits them through carefully designed queries, and aggregates representative traces to synthesize high-fidelity cloned prompts. Extensive evaluations show that cloned prompts replicate functional behavior with up to 85% semantic similarity, outperforming base LLMs by up to 8%, and even exceeding original system prompts when transferred to different back-end models. We also conduct a large-scale study on GitHub repositories, revealing that single-prompt architectures remain widespread in open-source LLM applications, reinforcing the real-world relevance of our threat model. Our findings reveal that prompt cloning enables unauthorized replication of confidential LLM behavior and underscore the urgent need for defenses that go beyond hiding prompt text.