CVPR2023

Neighborhood Attention Transformer

Ali Hassani, Steven Walton, Jiachen Li, Shen Li, Humphrey Shi

Abstract

https://github.com/SHI-Labs/Neighborhood-Attention-Transformer Self Attention (ViT) Window Self Attention (Swin) Shifted Window Self Attention (Swin) Neighborhood Attention (NAT) Figure 1. An illustration of attention spans in Self Attention, (Shifted) Window Self Attention, and our Neighborhood Attention. Self Attention allows each token to attend to everything. Window Self Attention divides self attention into non-overlapping sub-windows, and is followed by Shifted Window Self Attention, which allows for out-of-window interactions that are necessary to receptive field expansion. Neighborhood Attention localizes attention to a neighborhood around each token, introducing local inductive biases, maintaining translational equivariance, and allowing receptive field growth without needing extra operations.