NeurIPS2021

Provably efficient multi-task reinforcement learning with model transfer

Chicheng Zhang, Zhi Wang

被引用 19 次

摘要

We study multi-task reinforcement learning (RL) in tabular episodic Markov decision processes (MDPs). We formulate a heterogeneous multi-player RL problem, in which a group of players concurrently face similar but not necessarily identical MDPs, with a goal of improving their collective performance through inter-player information sharing. We design and analyze an algorithm based on the idea of model transfer, and provide gap-dependent and gap-independent upper and lower bounds that characterize the intrinsic complexity of the problem. Algorithm 1: MULTI-TASK-EULER Input :Failure probability δ ∈ (0, 1), dissimilarity parameter ǫ ≥ 0. Initialize: Set V p (⊥) = 0 for all p in [M ], where ⊥ is the only state in S H+1 ; 1 for k = 1, 2, . . . , K do 2 for p = 1, 2, . . . , M do // Construct optimal value estimates for player p Update optimal action value function upper and lower bound estimates: // All players p interact with their respective environments, and update reward and transition estimates