ICML2022

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant

被引用 17 次

摘要

We propose two algorithms that use linear function approximation (LFA) for stochastic shortest path (SSP) and bound their regret over KK episodes. When all stationary policies are proper, our first algorithm obtains sublinear regret (K3/4K^{3/4}), is computationally efficient, and uses stationary policies. This is the first LFA algorithm with these three properties, to the best of our knowledge. Our second algorithm improves the regret to K\sqrt{K} when the feature vectors satisfy certain assumptions. Both algorithms are special cases of a more general one, which has K\sqrt{K} regret for general features given access to a certain computation oracle. These algorithms and regret bounds are the first for SSP with function approximation.