CVPR2025

ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models

Ozgur Kara, Krishna Kumar Singh, Feng Liu, Duygu Ceylan, James M. Rehg, Tobias Hinz

Abstract

https://shotadapter.github.io/ "a woman knits on the couch in a warm living room" "she then organizes clothes in a walk in closet" "a man plays the guitar in his music room" "he then records a song on his laptop, listening with headphones" "a woman sits at a kitchen table, softly illuminated by morning light as she glances out the window" "she sips her coffee, her eyes drifting thoughtfully across the room" "the woman then begins to eat a simple meal, her movements slow and deliberate" "a man sketches in a notebook at a quiet cafe, his hand moving quickly across the page" "he pauses, looking up thoughtfully before continuing his drawing" "later, the man steps outside, his notebook tucked under his arm as he takes in the city around him" (a) 2-Shot Video (b) 3-Shot Video