NeurIPS2021
Weighted model estimation for offline model-based reinforcement learning
Toru Hishinuma, Kei Senda
15 citations
Abstract
Key idea โข Importance-weighted model estimation can improve predictive performance under covariate shift. ฯ ๐ค(๐ , ๐) ln ๐ ๐ (๐ โฒ |๐ , ๐) โข Natural idea: ๐ค ๐ , ๐ = distribution of real future data distribution of offline data โข Our idea: ๐ค ๐ , ๐ = distribution of simulated future data distribution of offline data We cannot access real future data. Estimating this weight is one of challenges in off-policy evaluation in RL. This weight is not easy-to-use. Key idea โข Importance-weighted model estimation can improve predictive performance under covariate shift. ฯ ๐ค(๐ , ๐) ln ๐ ๐ (๐ โฒ |๐ , ๐)