NeurIPS2021

Weighted model estimation for offline model-based reinforcement learning

Toru Hishinuma, Kei Senda

15 citations

Abstract

Key idea โ€ข Importance-weighted model estimation can improve predictive performance under covariate shift. ฯƒ ๐‘ค(๐‘ , ๐‘Ž) ln ๐‘ƒ ๐œƒ (๐‘  โ€ฒ |๐‘ , ๐‘Ž) โ€ข Natural idea: ๐‘ค ๐‘ , ๐‘Ž = distribution of real future data distribution of offline data โ€ข Our idea: ๐‘ค ๐‘ , ๐‘Ž = distribution of simulated future data distribution of offline data We cannot access real future data. Estimating this weight is one of challenges in off-policy evaluation in RL. This weight is not easy-to-use. Key idea โ€ข Importance-weighted model estimation can improve predictive performance under covariate shift. ฯƒ ๐‘ค(๐‘ , ๐‘Ž) ln ๐‘ƒ ๐œƒ (๐‘  โ€ฒ |๐‘ , ๐‘Ž)