NeurIPS2021

Weighted model estimation for offline model-based reinforcement learning

Toru Hishinuma, Kei Senda

被引用 15 次

摘要

Key idea • Importance-weighted model estimation can improve predictive performance under covariate shift. σ 𝑤(𝑠, 𝑎) ln 𝑃 𝜃 (𝑠 ′ |𝑠, 𝑎) • Natural idea: 𝑤 𝑠, 𝑎 = distribution of real future data distribution of offline data • Our idea: 𝑤 𝑠, 𝑎 = distribution of simulated future data distribution of offline data We cannot access real future data. Estimating this weight is one of challenges in off-policy evaluation in RL. This weight is not easy-to-use. Key idea • Importance-weighted model estimation can improve predictive performance under covariate shift. σ 𝑤(𝑠, 𝑎) ln 𝑃 𝜃 (𝑠 ′ |𝑠, 𝑎)