ICLR2022

Distributional Reinforcement Learning with Monotonic Splines

Yudong Luo, Guiliang Liu, Haonan Duan, Oliver Schulte, Pascal Poupart

被引用 18 次

摘要

One key challenge in quantile based distributional RL lies in how to parameterize the quantile function when minimizing the Wasserstein metric of temporal differences. Existing algorithms use step functions or piece-wise linear functions. We propose to learn smooth continuous quantile functions represented by monotonic rationalquadratic splines