CVPR2025
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model
Yuting Zhang, Hao Lu, Qingyong Hu, Yin Wang, Kaishen Yuan, Xin Liu, Kaishun Wu
Abstract
How many times did the person in the video repeat running on the treadmill? g author How many times did the person in the The person in the video repeated running 5 times. The repetitive motion in the video is running on the treadmill 14 times. The analysis did not detect any clear repetitions in the video based on the motion patterns. Figure 1. Existing multimodal models may fail to analyze periodic tasks, such as motion counting, traffic flow, weather forecasting, and remote health care. For example, GPT-4 fails to detect clear repetitions due to the difficulty in analyzing motion patterns, and Video-LLaMA incorrectly counts the repetitions. In contrast, Period-LLM accurately analyzes the actions and provides correct repetition count.