S&P2025

Guardain: Protecting Emerging Generative AI Workloads on Heterogeneous NPU

Aritra Dhar, Clément Thorens, Lara Magdalena Lazier, Lukas Cavigelli

Abstract

Driven by recent advances in large language models (LLMs), generative AI applications have become the dominant workload for the modern cloud. Specialized hardware accelerators, such as GPUs, NPUs, and TPUs, play a key role in AI adoption due to their superior performance over general-purpose CPUs. AI models and the data are often highly sensitive and come from mutually distrusting parties. Existing industry-standard CPU-based TEEs, such as Intel SGX or AMD SEV, do not adequately protect these accelerators. Device-TEEs like Nvidia-CC only address tightly coupled CPU-GPU systems with a proprietary solution requiring TEE on the host CPU side. On the other hand, existing academic proposals target specific CPU-TEE platforms. To address this gap, we propose Guardain,a confidential computing architecture for discrete NPU devices that requires no trust in the host system. Guardainsecures data, model parameters, and operator binaries through authenticated encryption. Guardainuses delegation-based memory semantics to ensure isolation from the host software stack, while task attestation guarantees strong model integrity. Our G Uardainimplementation and evaluation with state-of-the-art LLMs such as Llama2 and Llama3 shows that Guardainintroduces minimal overhead with no changes in the AI software stack.