ICCV2023

Ordered Atomic Activity for Fine-grained Interactive Traffic Scenario Understanding

Nakul Agarwal, Yi-Ting Chen

被引用 7 次

摘要

We introduce a novel representation called Ordered Atomic Activity for interactive scenario understanding. The representation decomposes each scenario into a set of ordered atomic activities, where each activity consists of an action and the corresponding actors involved and the order denotes the temporal development of the scenario. This design also helps in identifying important interactive relationships, such as yielding. The action is a high-level semantic motion pattern that is grounded in the surrounding road topology, which we decompose into zones and corners with unique IDs. For example, a group of pedestrians crossing in front is denoted as C1 → C4: P+, as depicted in Figure 1 . We collect a new large-scale dataset called OATS 1 (Ordered Atomic Activities in interactive Traffic Scenarios), comprising 1026 video clips (∼ 20s) captured at intersections in San Francisco Bay Area. Each clip is labeled with the proposed language, resulting in 59 activity categories and 6512 annotated activity instances. We propose three fine-grained scenario understanding tasks, i.e., multilabel atomic activity recognition, activity order prediction, and interactive scenario retrieval. We also propose a Graph Convolutional Network based framework that models both appearance and motion of traffic participants to tackle the above tasks, that performs favorably against state-of-theart methods. However, we find that the methods cannot achieve satisfactory performance, indicating rising opportunities for the community to develop new algorithms for these tasks towards better interactive scenario understanding.