ICLR2025

A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals

Grace Liu, Michael Tang, Benjamin Eysenbach

摘要

In this paper, we present empirical evidence of skills and directed exploration emerging from a simple RL algorithm long before any successful trials are observed. For example, in a manipulation task, the agent is given a single observation of the goal state (see Fig. 1 ) and learns skills, first moving its end-effector, then pushing the block, and finally lifting and placing the block. These skills emerge before the agent has ever successfully placed the block at the goal location and without the aid of any reward functions, demonstrations, or manually-specified distance metrics. Implementing our method involves a simple modification of prior work and does not require density estimates, ensembles, or any additional hyperparameters. We lack a clear theoretical understanding of why the method works so effectively, though our experiments provide some hints. Videos and code: https://graliuce.github.io/sgcrl Sawyer bin Training progress Open/close gripper Push object Roll object between bins Pick and place Move hand left/right move hand front/back Pick up the object Knock object out of bin Figure 1: Skills and Directed Exploration Emerge. In this bin picking task, we provide the agent with a single goal observation where the green block is in the left bin. The agent never receives any rewards. Throughout the course of training, the agent learns skills that increase in complexity. Easier skills seem to enable the agent to unlock more complex skills: moving the hand is a prerequisite for pushing the object; closing the gripper is a prerequisite for picking up the object, which is a prerequisite for moving the object. Figures 7 and 8 in Appendix D show similar progress milestones for other tasks.