CCS2024

Fast and Accurate Homomorphic Softmax Evaluation

Wonhee Cho, Guillaume Hanrot, Taeseong Kim, Minje Park, Damien Stehlé

6 citations

Abstract

Homomorphic encryption is one of the main solutions for building secure and privacy-preserving solutions for Machine Learning as a Service, a major challenge in a society where AI becomes more and more pervasive. This motivates the development of homomorphic algorithms for the main building blocks of AI, typically for the components of the various types of neural networks architectures. Among those components, we focus on the Softmax function, defined by Softmax(x) = exp(𝑥 𝑖 )/ 𝑛 𝑗=1 exp(𝑥 𝑗 ) 1≤𝑖 ≤𝑛 * Corresponding author. in 486s (single-thread CPU), corresponding to an amortized 0.06s per Softmax call. All Softmax calls of the 32-layers LLaMa large language model (7B version) with context length 128 on an RTX-6000 GPU take around 1.5 minutes, and the final Softmax call in dimension 32768 for token generation takes less than 3 seconds. This suggests that near-practicality may be accessible with dedicated hardware.