CVPR2025
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
Keda Tao, Can Qin, Haoxuan You, Yang Sui, Huan Wang
摘要
We introduce DyCoke (dynamic compression of tokens), a training-free token compression method for fast video large language models. The key innovation of DyCoke over its predecessors is to dynamically remove redundant tokens during the decoding stage, squeezing both the temporal (video frames) and spatial redundancy in visual tokens. Right: Efficiency and performance comparison of various training-free token pruning methods on MVBench [23] with LLaVA-OV-7B [18]. DyCoke surpasses the SoTA counterparts (PruMerge [39], FastV [3]), with 1.5× inference speedup and a 1.4× reduction in memory usage relative to the baseline, while simultaneously enhancing performance.