ICLR2025

EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos

Jilan Xu, Yifei Huang, Baoqi Pei, Junlin Hou, Qingqiu Li, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie

摘要

First frame Predicted future frames ⋅⋅⋅ "C stirs the noodles in the pot with the spoon in his right hand." Exo-centric video Ego-centric video Figure 1 : The cross-view video prediction task aims to predict future RGB frames of the ego-centric video, given the first ego-centric frame, a text instruction, and a synchronised exo-centric video.