ICCV2023

GET: Group Event Transformer for Event-Based Vision

Yansong Peng, Yueyi Zhang, Zhiwei Xiong, Xiaoyan Sun, Feng Wu

被引用 86 次

摘要

Event cameras are a type of novel neuromorphic sensor that has been gaining increasing attention. Existing event-based backbones mainly rely on image-based designs to extract spatial information within the image transformed from events, overlooking important event properties like time and polarity. To address this issue, we propose a novel Group-based vision Transformer backbone for Event-based vision, called Group Event Transformer (GET), which decouples temporal-polarity information from spatial information throughout the feature extraction process. Specifically, we first propose a new event representation for GET, named Group Token, which groups asynchronous events based on their timestamps and polarities. Then, GET applies the Event Dual Self-Attention block, and Group Token Aggregation module to facilitate effective feature communication and integration in both the spatial and temporalpolarity domains. After that, GET can be integrated with different downstream tasks by connecting it with various heads. We evaluate our method on four event-based classification datasets (Cifar10-DVS, N-MNIST, N-CARS, and DVS128Gesture) and two event-based object detection datasets (1Mpx and Gen1), and the results demonstrate that GET outperforms other state-of-the-art methods. The code is available at https://github.com/Peterande/ GET-Group-Event-Transformer .