NeurIPS2021

A Max-Min Entropy Framework for Reinforcement Learning

Seungyul Han, Youngchul Sung

被引用 38 次

摘要

In this paper, we propose a max-min entropy framework for reinforcement learning (RL) to overcome the limitation of the soft actor-critic (SAC) algorithm implementing the maximum entropy RL in model-free sample-based learning. Whereas the maximum entropy RL guides learning for policies to reach states with high entropy in the future, the proposed max-min entropy framework aims to learn to visit states with low entropy and maximize the entropy of these low-entropy states to promote better exploration. For general Markov decision processes (MDPs), an efficient algorithm is constructed under the proposed max-min entropy framework based on disentanglement of exploration and exploitation. Numerical results show that the proposed algorithm yields drastic performance improvement over the current state-of-the-art RL algorithms. in model-free sample-based learning with function approximation. In order to overcome such limitations associated with implementation of the maximum entropy RL, we propose a max-min entropy framework for RL, which aims to learn policies reaching states with low entropy and maximizing the entropy of these low-entropy states, whereas the conventional maximum entropy RL optimizes for policies that aim to visit states with high entropy and maximize the entropy of those high-entropy states for high entropy of the entire trajectory. We implemented the proposed max-min entropy framework into a practical iterative actor-critic algorithm based on policy iteration with disentangled exploration and exploitation. It is demonstrated that the proposed algorithm significantly enhances exploration capability due to the fairness across states induced by the max-min framework and yields drastic performance improvement over existing RL algorithms including maximum-entropy SAC on difficult control tasks.