CVPR2025

Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation

Yueru Jia, Jiaming Liu, Sixiang Chen, Chenyang Gu, Zhilue Wang, Longzan Luo, Xiaoqi Li, Pengwei Wang, Zhongyuan Wang, Renrui Zhang, Shanghang Zhang

摘要

Reconstructing Depth MAE Decoder 2D Foundation Model Guide a) Implicit 3D robotic representation Policy Head 2D Foundation Model b) Explicit 3D robotic representation Attention Execution Robot State Point Cloud 2D Position Embedding Image Place bottle at rack Pour water Unplug charge Slide block Pick and place Stack blocks Water plants Wipe table Open drawer… Adroit (MuJoCo, Dexterous hands) MetaWorld (MuJoCo, Gripper) RLBench (CoppelaSim, Gripper) Left part: Overview of our proposed Lift3D Right part: Extensive evaluation on simulation and real-world tasks. Figure 1. Lift3D empowers 2D foundation models with 3D manipulation capabilities by refining implicit 3D robotic representations through task-related affordance masking and depth reconstruction, while enhancing explicit 3D robotic representations by leveraging the pretrained 2D positional embeddings to encode point cloud. Lift3D achieves robustness and surprising effectiveness in diverse simulation and real-world tasks.