CVPR2024

vid-TLDR: Training Free Token merging for Light-Weight Video Transformer

Joonmyung Choi, Sanghyeok Lee, Jaewon Chu, Minhyuk Choi, Hyunwoo J. Kim

摘要

Figure 1. Comparison of vid-TLDR (Ours) with UMT [33]. Without any additional training, vid-TLDR obtains comparable or even better performance than the base model UMT (left) while reducing the considerable computational cost (right). UMT-B (87M) is used.