ACL2024

Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models

Yida Zhao, Chao Lou, Kewei Tu

摘要

Syntactic Transformer language models aim to 001 achieve better generalization through simulta-002 neously modeling syntax trees and sentences. 003 While prior work has been focusing on adding 004 constituency-based structures to Transformers, 005 we introduce Dependency Transformer Gram-006 mars (DTGs), a new class of Transformer lan-007 guage model with explicit dependency-based 008 inductive bias. DTGs simulate dependency 009 transition systems with constrained attention 010 patterns by modifying attention masks, incor-011 porate the stack information through relative 012 positional encoding, and augment dependency 013 arc representation with a combination of to-014 ken embeddings and operation embeddings. 015 When trained on a dataset of sentences anno-016 tated with dependency trees, DTGs achieve 017 better generalization while maintaining com-018 parable perplexity with Transformer language 019 model baselines. DTGs also outperform re-020 cent constituency-based models, showing that 021 dependency can better guide Transformer lan-022 guage models. Our code will be publicly avail-023 able upon acceptance. 024 1 Introduction 025 Transformer language models have shown strong 026 performance on language modeling tasks and a 027 broad spectrum of downstream tasks (Radford 028