ICLR2023

Relational Attention: Generalizing Transformers for Graph-Structured Tasks

Cameron Diao, Ricky Loynd

被引用 6 次

摘要

Transformers flexibly operate over sets of real-valued vectors representing taskspecific entities and their attributes, where each vector might encode one wordpiece token and its position in a sequence, or some piece of information that carries no position at all. But as set processors, standard transformers are at a disadvantage in reasoning over more general graph-structured data where nodes represent entities and edges represent relations between entities. To address this shortcoming, we generalize transformer attention to consider and update edge vectors in each transformer layer. We evaluate this relational transformer on a diverse array of graph-structured tasks, including the large and challenging CLRS Algorithmic Reasoning Benchmark. There, it dramatically outperforms state-of-theart graph neural networks expressly designed to reason over graph-structured data. Our analysis demonstrates that these gains are attributable to relational attention's inherent ability to leverage the greater expressivity of graphs over sets.