EMNLP2023

Once is Enough: A Light-Weight Cross-Attention for Fast Sentence Pair Modeling

Yuanhang Yang, Shiyi Qi, Chuanyi Liu, Qifan Wang, Cuiyun Gao, Zenglin Xu

1 citation

Abstract

Transformer-based models have achieved great success on sentence pair modeling tasks, such as answer selection and natural language inference (NLI). These models generally perform cross-attention over input pairs, leading to prohibitive computational costs. Recent studies propose dual-encoder and late interaction architectures for faster computation. However, the balance between the expressive of crossattention and computation speedup still needs better coordinated. To this end, this paper introduces a novel paradigm MixEncoder for efficient sentence pair modeling. MixEncoder involves a lightweight cross-attention mechanism. It avoids the repeated encoding of the same query for different candidates, thus allowing modeling the query-candidate interaction in parallel. Extensive experiments conducted on four tasks demonstrate that our Mix-Encoder can speed up sentence pairing by over 113x while achieving comparable performance as the more expensive cross-attention models. The source code is available at https: //github.com/ysngki/MixEncoder .