NeurIPS2023

Robust Knowledge Transfer in Tiered Reinforcement Learning

Jiawei Huang, Niao He

被引用 1 次

摘要

In this paper, we study the Tiered Reinforcement Learning setting, a parallel transfer learning framework, where the goal is to transfer knowledge from the low-tier (source) task to the high-tier (target) task to reduce the exploration risk of the latter while solving the two tasks in parallel. Unlike previous work, we do not assume the low-tier and high-tier tasks share the same dynamics or reward functions, and focus on robust knowledge transfer without prior knowledge on the task similarity. We identify a natural and necessary condition called the "Optimal Value Dominance" for our objective. Under this condition, we propose novel online learning algorithms such that, for the high-tier task, it can achieve constant regret on partial states depending on the task similarity and retain near-optimal regret when the two tasks are dissimilar, while for the low-tier task, it can keep near-optimal without making sacrifice. Moreover, we further study the setting with multiple low-tier tasks, and propose a novel transfer source selection mechanism, which can ensemble the information from all low-tier tasks and allow provable benefits on a much larger state-action space. Introduction Comparing with individual learning from scratch, transferring knowledge from other similar tasks or side information has been proven to be an effective way to reduce the exploration risk and improve sample efficiency in Reinforcement Learning (RL). Multi-Task RL (MT-RL) [29] and Transfer RL [26, 18, 37] are two mainstream knowledge transfer frameworks; however, both are subject to limitations when dealing with real-world scenarios. MT-RL studies the setting where a set of similar tasks are solved concurrently, and the main objective is to accelerate the learning by sharing information of all tasks together. However, in practice, in many MT-RL scenarios, the tasks are not equally important and we are more interested in the performance of certain tasks. For example, in robot learning, a few robots are more valuable and hard to fix, while the others are cheaper or just simulators. Most existing works on MT-RL treat all tasks equally and focus primarily on the reduction of the total regret of all tasks as a whole [3, 8, 36, 11] , with no guarantee of improving a particular task. In contrast, transfer RL distinguishes the priority of different tasks by categorizing them into source and target tasks and aims at transferring the knowledge from source tasks (or some side information like value predictors) to facilitate the learning of target tasks [21, 27, 9, 10] . However, a key assumption in transfer RL is that the source task is completely solved before the learning of the target task, and this is not always practical. For example, in some sim-to-real domain, the source task simulator may require a long time to solve [5] , and in some user-interaction scenarios [13], the source and target tasks refer to different user groups and they have to be served simultaneously. In these cases, it's more reasonable to solve the source and target tasks in parallel and transfer the information immediately once available. Recently, [13] proposed a new "parallel knowledge transfer" framework, called Tiered RL, which is promising to fill the gap. Tiered RL considers the case when a source task M Lo and a target task 37th Conference on Neural Information Processing Systems (NeurIPS 2023).