ACL2024

EIT: Enhanced Interactive Transformer

Tong Zheng, Bei Li, Huiwen Bao, Tong Xiao, JingBo Zhu

Abstract

Two principles: the complementary princi-001 ple and the consensus principle are widely 002 acknowledged in the literature of multi-view 003 learning. However, the current design of Multi-004 head self-attention, an instance of multi-view 005 learning, prioritizes the complementarity while 006 ignoring the consensus. To address this prob-007 lem, we propose an enhanced multi-head self-008 attention (EMHA). First, to satisfy the comple-009 mentary principle, EMHA removes the one-010 to-one mapping constraint among queries and 011 keys in multiple subspaces and allows each 012 query to attend to multiple keys. On top of that, 013 we develop a method to fully encourage consen-014 sus among heads by introducing two interaction 015 models, namely Inner-Subspace Interaction and 016 Cross-Subspace Interaction. Extensive experi-017 ments on a wide range of language tasks (e.g., 018 machine translation, abstractive summarization 019 and grammar correction, language modeling),