ICLR2026

D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping

Haozhe Lou, Mingtong Zhang, Haoran Geng, Hanyang Zhou, Sicheng He, Zhiyuan Gao, Siheng Zhao, Jiageng Mao, Pieter Abbeel, Jitendra Malik, Daniel Seita, Yue Wang

1 citation

DOI arXiv Publisher

Abstract

Simulation provides a cost-effective and flexible platform for data generation and policy learning to develop robotic systems. However, bridging the gap between simulation and real-world dynamics remains a significant challenge, especially in physical parameter identification. In this work, we introduce a real-to-sim-to-real engine that leverages the Gaussian Splat representations to build a differentiable engine, enabling object mass identification from real-world visual observations and robot control signals, while enabling grasping policy learning simultaneously. Through optimizing the mass of the manipulated object, our method automatically builds high-fidelity and physically plausible digital twins. Additionally, we propose a novel approach to train force-aware grasping policies from limited data by transferring feasible human demonstrations into simulated robot demonstrations. Through comprehensive experiments, we demonstrate that our engine achieves accurate and robust performance in mass identification across various object geometries and mass values. Those optimized mass values facilitate force-aware policy learning, achieving superior and high performance in object grasping, effectively reducing the sim-to-real gap. Our code and project page is available at drex.github.io. Real-to-Sim Learning Robot Policy from Human Videos Differentiable Engine Mass Identification Learning Object Mass from Robot Videos 4D Gaussian Splats Rendering Figure 1 : We present D-REX, a differentiable real-to-sim-to-real engine that enables 4D photorealistic rendering and physical simulation by identifying object mass from real-world visual observations and robot interaction data. D-REX reconstructs object geometry using Gaussian Splat representations and leverages a differentiable physics engine for end-to-end mass identification. The identified mass is then used to enable force-aware policy learning from human demonstrations, supporting robust grasping and sim-to-real transfer in dexterous grasping tasks.