ACL2025
Decoder-Only LLMs can be Masked Auto-Encoders
Dan Qiao, Yuan Gao, Zheming Yang, Di Yang, Ziheng Wu, Pengcheng Lu, Minghui Qiu, Juntao Li, Min Zhang
被引用 1 次
摘要
Modern NLP workflows (e.g., RAG systems) require different models for generation and embedding tasks, where bidirectional pre-trained encoders and decoder-only Large Language Models (LLMs) dominate respective tasks. Structural differences between models result in extra development costs and limit knowledge sharing between tasks. In this work, we present UniMAE, a novel unsupervised training method that transforms a Decoder-Only LLM into a Uni-Directional Masked Auto-Encoder. UniMAE compresses high-quality semantic information into the [EOS] embedding while preserving the generation capabilities of LLMs. Comprehensive evaluations across 56 MTEB datasets demonstrate that UniMAE can achieve state-of-the-art results under unsupervised settings with merely 100 training steps, establishing the first effective approach to unifying generation and representation learning in decoderonly architectures. * Corresponding author x 1 x 2 x 3 x 4 x 1 mask x 3 mask eos Query Key Value x 1 e 1 e 2 e 3 e 4 e s x 2 x 3 x 4 eos Decoder-Only LLM Random Mask Decoder x 2 x 3 x 4 e 1 e 2 e 3 e 4