NeurIPS2023
EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset
Hao Tang, Kevin J. Liang, Kristen Grauman, Matt Feiszli, Weiyao Wang
29 citations
Abstract
Visual object tracking is key to many egocentric vision problems. However, the full spectrum of challenges of egocentric tracking faced by an embodied AI is underrepresented in many existing datasets, which tend to focus on short, third-person videos. Egocentric video has several distinguishing characteristics from those commonly found in past datasets: frequent large camera motions and hand interactions with objects commonly lead to occlusions or objects exiting the frame, and object appearance can change rapidly due to widely different points of view, scale, or object states. Embodied tracking is also naturally long-term, and being able to consistently (re-)associate objects to their appearances and disappearances over as long as a lifetime is critical. Previous datasets under-emphasize this re-detection problem, and their "framed" nature has led to adoption of various spatiotemporal priors that we find do not necessarily generalize to egocentric video. We thus introduce EgoTracks, a new dataset for long-term egocentric visual object tracking. Sourced from the Ego4D dataset, EgoTracks presents a significant challenge to recent stateof-the-art single-object trackers, which we find score more poorly on our new dataset than existing popular benchmarks, according to traditional tracking metrics. We further show improvements that can be made to the STARK tracker to significantly increase its performance on egocentric data, resulting in a baseline model we call EgoSTARK. We publicly release our annotations and benchmark ( https:// github.com/EGO4D/episodic-memory/tree/main/EgoTracks ), hoping our dataset leads to further advancements in tracking. 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks.