CVPR2025

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Joya Chen, Ziyun Zeng, Yiqi Lin, Wei Li, Zejun Ma, Mike Zheng Shou

Abstract

2 Bytedance Corresponding Author. * Equal Contribution. improvements in general video QA and exhibits a new capability in real-time video commentary. To evaluate this, we carefully design a new benchmark LiveSports-3K, using LLM-as-a-judge to measure the free-form commentary. Experiments show our final model LiveCC-7B can surpass LLaVA-Video-72B in commentary quality even working in a real-time mode. Meanwhile, it achieves stateof-the-art results at the 7B scale on popular benchmarks such as VideoMME, demonstrating its broad generalizability. All resources of this paper have been released at showlab.github.io/livecc.