CCS2025
THOR: Secure Transformer Inference with Homomorphic Encryption
Jungho Moon, Dongwoo Yoo, Xiaoqian Jiang, Miran Kim
1 citation
Abstract
As large language models are increasingly deployed in cloud environments, privacy concerns have become a significant issue. To address this challenge, we present THOR, a non-interactive framework for secure transformer inference using homomorphic encryption. We first propose efficient matrix multiplication algorithms based on diagonal-major encoding and compact ciphertext packing. We extend these basic algorithms to support plaintext-ciphertext matrix multiplication (PC-MM) using parallel submatrix computation and ciphertext-ciphertext multiplication (CC-MM) with a baby-step giant-step strategy. We also design efficient evaluation strategies for non-linear functions such as softmax, LayerNorm, GELU, and Tanh, by integrating advanced approximation techniques with adaptive iterative methods. Our matrix multiplication algorithms outperform state-of-the-art methods, achieving up to 5.3X speedup in PC-MM for ℝ 768 X 768 X ℝ768X128 over BOLT (Pang et al., IEEE S&P 2024) and 9.7X in CC-MM for 12X (ℝ64X128 X ℝ128X128) over Powerformer (Park et al., Preprint). THOR enables secure inference on the BERT-base model with 128 tokens in 10 minutes on a single GPU, while maintaining comparable accuracy on GLUE tasks.