CVPR2025

Any6D: Model-free 6D Pose Estimation of Novel Objects

Taeyeop Lee, Bowen Wen, Minjun Kang, Gyuree Kang, In So Kweon, Kuk-Jin Yoon

Abstract

We introduce Any6D, a model-free framework for 6D object pose estimation that requires only a single RGB-D anchor image to estimate both the 6D pose and size of unknown objects in novel scenes. Unlike existing methods that rely on textured 3D models or multiple viewpoints, Any6D leverages a joint object alignment process to enhance 2D-3D alignment and metric scale estimation for improved pose accuracy. Our approach integrates a renderand-compare strategy to generate and refine pose hypotheses, enabling robust performance in scenarios with occlusions, non-overlapping views, diverse lighting conditions, and large cross-environment variations. We evaluate our method on five challenging datasets: REAL275, Toyota-Light, HO3D, YCBINEOAT, and LM-O, demonstrating its effectiveness in significantly outperforming state-of-the-art methods for novel object pose estimation. Project page: https://taeyeop.com/any6d Recent research has shifted toward category-agnostic approaches [25, 30, 47, 47, 52, 58, 70, 71] to address the limitations of both category-level and instance-level pose estimation. These efforts can be broadly divided into two directions: model-based methods [30, 47, 52] , which require textured RGB 3D CAD models at test time, and model-free methods [16, 39, 59, 77] , which utilize multiview reference images or video sequences of the target object during inference. Although both approaches show promising results, they still face significant practical limitations when dealing with unseen objects that are not physically accessible. In robotic manipulation scenarios, for example, these methods This CVPR paper is the Open Access version, provided by the Computer Vision Foundation. Except for this watermark, it is identical to the accepted version; the final published version of the proceedings is available on IEEE Xplore.