NeurIPS2023

Learning Environment-Aware Affordance for 3D Articulated Object Manipulation under Occlusions

Ruihai Wu, Kai Cheng, Yan Zhao, Chuanruo Ning, Guanqi Zhan, Hao Dong

被引用 38 次

摘要

Perceiving and manipulating 3D articulated objects in diverse environments is essential for home-assistant robots. Recent studies have shown that point-level affordance provides actionable priors for downstream manipulation tasks. However, existing works primarily focus on single-object scenarios with homogeneous agents, overlooking the realistic constraints imposed by the environment and the agent's morphology, e.g., occlusions and physical limitations. In this paper, we propose an environment-aware affordance framework that incorporates both object-level actionable priors and environment constraints. Unlike object-centric affordance approaches, learning environment-aware affordance faces the challenge of combinatorial explosion due to the complexity of various occlusions, characterized by their quantities, geometries, positions and poses. To address this and enhance data efficiency, we introduce a novel contrastive affordance learning framework capable of training on scenes containing a single occluder and generalizing to scenes with complex occluder combinations. Experiments demonstrate the effectiveness of our proposed approach in learning affordance considering environment constraints. Introduction Articulated objects, such as doors and drawers, exist everywhere in our daily life. Perceiving and manipulating these objects present crucial yet challenging tasks in computer vision and robotics. Unlike rigid objects, articulated objects exhibit diverse articulation types and functionally important articulated parts crucial for human and robot interactions. Numerous research endeavors have been investigating articulated objects broadly, encompassing joint parameters estimation [41, 47] , part pose estimation [22, 23] , kinematic structure estimation [34, 33] , digital twins generalization [13, 10] , articulated part robotic manipulation [25, 44, 46, 4] and few-shot policy adaptation [42] . However, most existing works for manipulating articulated objects primarily focus on single-object scenarios with homogeneous agents, such as flying grippers [46, 25, 44] or fixed-position robot arms [6] . Consequently, these approaches tend to develop object-centric representations and policies, neglecting the realistic constraints imposed by both the environment and the agent's morphology. These constraints are commonplace in real-world scenarios and their oversight limits the applicability and performance of the manipulation tasks. For example, successfully opening a cabinet door that is obstructed by occluders not only depends on the properties of the target door but also heavily relies on the robot's position and the way it interacts (e.g., colliding or bypassing) with the occluders. We take a significant step towards manipulating articulated objects in a more realistic setting, i.e., considering constraints imposed by the environment and robot. Such a task encounters the combi- * Equal contribution. Author ordering determined by coin flip.