CCS2025

Scalable Cryptography for Trustworthy Machine Learning in the LLM Era

Gefei Tan

摘要

Modern cryptographic tools such as multi-party computation (MPC) and zero-knowledge proofs (ZKPs) offer strong, provable security guarantees—but these generic protocols remain impractical for production-scale machine learning (ML), especially in the era of large language models (LLMs). This thesis proposal advances the central claim that cryptographic protocols co-designed with the structure of specific ML subtasks can achieve practical efficiency without compromising privacy or verifiability. To validate this vision, this proposal develops three interconnected research thrusts: (1) Confidential Outsourced Training. Customized MPC protocols shift expensive cryptographic steps to local computations, enabling secure training of large models in untrusted clouds by resource-constrained data owners. (2) Scalable MPC Primitives for Large Datasets. Provably secure building blocks—such as oblivious shuffles, private joins, and sparse linear algebra routines—bridge the performance gap in privacy-preserving data pipelines at scale. (3) Verifiable ML without Retraining. Rather than proving each training step, a new proof-of-optimality framework certifies that a trained or fine-tuned model (e.g., LoRA adapters) satisfies desired properties, enabling efficient, auditable deployment without re-executing training. Together, these efforts aim to close the long-standing gap between privacy and efficiency, demonstrating that strong cryptographic guarantees and modern ML workflows can be reconciled through principled, application-aware design.