CVPR2025

Navigation World Models

Amir Bar, Gaoyue Zhou, Danny Tran, Trevor Darrell, Yann LeCun

摘要

Figure 1. We train a Navigation World Model (NWM) from video footage of robots and their associated navigation actions (a). After training, NWM can evaluate trajectories by synthesizing their videos and scoring the final frame's similarity with the goal (b). We use NWM to plan from scratch or rank experts navigation trajectories, improving downstream visual navigation performance. In unknown environments, NWM can simulate imagined trajectories from a single image (c). In all examples above, the input to the model is the first image and actions, then the model auto-regressively synthesizes future observations. Click on the image to view examples in a browser.