S&P2025

On the (In)Security of LLM App Stores

Xinyi Hou, Yanjie Zhao, Haoyu Wang

摘要

LLM app stores have seen rapid growth, leading to the proliferation of numerous custom LLM apps. However, this expansion raises security concerns. In this study, we propose a three-layer concern framework to identify the potential security risks of LLM apps, i.e., LLM apps with abusive potential, LLM apps with malicious intent, and LLM apps with backdoors. Over five months, we collected 786,036 LLM apps from six major app stores: GPT Store, FlowGPT, Poe, Coze, Cici, and Character.AI. Our research integrates static and dynamic analysis, and uses a complementary approach to detect harmful content, combining a self-refining LLM-based toxic content detector with rule-based pattern matching. Additionally, we constructed a large-scale toxic word dictionary (i.e., ToxicDict) comprising over 31,783 entries. We used these methods to uncover that 15,414 apps had misleading descriptions, 1,366 collected sensitive personal information against their privacy policies, and 15,996 generated harmful content such as hate speech, self-harm, extremism, etc. Additionally, we evaluated the potential for LLM apps to facilitate malicious activities, finding that 616 apps could be used for malware generation, phishing, etc. We reported these security risks to relevant platforms, including OpenAI and Quora, which acknowledged and appreciated our findings. The platforms are actively investigating the flagged apps; as of the submission of this paper, 1,643 apps have been removed from the GPT Store.