ICLR2026
FHE-Coder: Benchmarking Secure Agentic Code Generation for Fully Homomorphic Encryption
Mayank Kumar, Jiaqi Xue, Mengxin Zheng, Qian Lou
Abstract
Fully Homomorphic Encryption (FHE) is a foundational technology for confidential computing, yet its practical adoption remains limited by the need for specialized cryptographic expertise and error-prone parameter configuration. To lower this barrier, we investigate whether Large Language Model (LLM) agents can reliably generate secure FHE code from natural-language specifications. We present FHE-Coder, a three-phase agentic framework that addresses the key failure modes of FHE code generation: semantic ambiguity, API misuse, and cryptographic insecurity. The framework integrates (1) a Prompt Formalizer that structures user intent and enforces secure parameterization, (2) a specialized retrievalaugmented generation (RAG) module that supplies scheme-specific API and documentation knowledge, and (3) an automated Security Verifier that performs iterative validation and feedback to detect and correct cryptographic flaws. We evaluate FHE-Coder across four leading LLMs on a benchmark of ten FHE programming tasks spanning increasing functional and security complexity. While baseline agents frequently produce code that compiles and passes functional tests, they often violate security constraints or misuse cryptographic parameters. In contrast, FHE-Coder consistently generates solutions that are compilable, functionally correct, and verifiably secure across schemes including TFHE and CKKS. Our work establishes a systematic methodology and benchmark for agentic FHE code generation, providing a practical step toward democratizing secure computation without compromising cryptographic guarantees. Project page: https://fhe-coder.github.io Recent advances extend beyond standalone LLMs toward LLM agents, which integrate planning, retrieval, and tool-use capabilities into iterative reasoning pipelines. In adjacent domains such as High-Level Synthesis (HLS) and Register Transfer Level (RTL) hardware design (Thakur et al., 2023; Liao et al., 2024; Xiong et al., 2024) , LLM-based systems have demonstrated the ability to