KDD2025

Reinvent the Operation not the Architecture: Quantum-inspired High-order Product for Compatible and Improved LLMs Training

Hao Xiong, Yebin Yang, Huaijin Wu, Xiaoqiu Zhong, Yehui Tang, Zhuo Xia, Xiaoxing Wang, Junchi Yan

1 citation

Abstract

We rethink the basic operations, i.e., inner product and matrix multiplication used in neural networks. A quantum-inspired alternative is proposed, utilizing the power of high-dimensional Hilbert space by devising a high-order form of tensor product. We re-parameterize the original (low-order) vectors/matrices into an expressive high-order form, without incurring extra model parameters, and the extra computational overhead is negligible (e.g., about 2%). As an in-place transparent atomic operation, we show its use in the key components in Transformers: token embeddings, attentions (query, key, value) and the MLP. Due to its inherent compatibility to vanilla multiplicative operations, we propose C2Q-SFT, i.e., classic-to-quantum (C2Q) protocol for supervised fine-tuning (SFT): it continues to train a given model by transparently replacing the standard operations with ours. As shown by our experiments, it shows advantages for both training from scratch and fine-tuning on downstream tasks across scales of LLMs. C2Q-SFT consistently outperforms standard SFT, with relative improvements on MMLU (+0.56%) and GSM8k (+0.61%). It sheds light on the innovation of operations in networks, orthogonal to the efforts on new architecture, position encoding, and training algorithms, etc. See project page at: https://github.com/Thinklab-SJTU/LLM/QI-LLM.