ICML2022

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant

被引用 17 次

摘要

We propose two algorithms that use linear function approximation (LFA) for stochastic shortest path (SSP) and bound their regret over $K$ episodes. When all stationary policies are proper, our first algorithm obtains sublinear regret ( $K^{3/4}$ ), is computationally efficient, and uses stationary policies. This is the first LFA algorithm with these three properties, to the best of our knowledge. Our second algorithm improves the regret to $\sqrt{K}$ when the feature vectors satisfy certain assumptions. Both algorithms are special cases of a more general one, which has $\sqrt{K}$ regret for general features given access to a certain computation oracle. These algorithms and regret bounds are the first for SSP with function approximation.