CVPR2025

Mimir: Improving Video Diffusion Models for Precise Text Understanding

Shuai Tan, Biao Gong, Yutong Feng, Kecheng Zheng, Dandan Zheng, Shuwei Shi, Yujun Shen, Jingdong Chen, Ming Yang

Abstract

A majestic eagle soars above a vast, snow-covered forest, its powerful wings cutting through the crisp winter air. Below, a dense canopy of evergreen trees is blanketed in pristine snow, creating a serene landscape. A woman with short, curly silverhair in a red dress stands in a dimly lit, futuristic setting, looking to her left. Neon lights and digital screens fill the scene. By the end, she gazes away from the viewer against a neon cityscape, conveying mystery and anticipation. The transformation from a flower bud to a fully blooming flower. A vast, golden desert stretches endlessly under a brilliant blue sky in the early morning light. As the sun sets, the sky transforms into a canvas of vibrant oranges. Finally, the night falls, revealing a breathtaking canopy of stars, with the Milky Way arching gracefully over the tranquil.