CVPR2023

Unified Keypoint-Based Action Recognition Framework via Structured Keypoint Pooling

Ryo Hachiuma, Fumiaki Sato, Taiki Sekii

Abstract

Skeleton-based Spatio-temporal Action Localization Skeleton-based Action Recognition 𝑡 𝑡 𝑡 Figure 1. Qualitative results of the proposed framework for the skeleton-based action recognition (top) and spatio-temporal localization task (bottom). The input keypoints and the estimated action labels are visualized in the figure. We achieve state-of-the-art accuracy for the recognition task while it runs ∼1800FPS on a single RTX 3080Ti GPU. In addition, the proposed method outperforms the state-of-the-art weakly supervised spatio-temporal localization methods. See the website for the demo video.