ICML2022
Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation
Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant
17 citations
Abstract
We propose two algorithms that use linear function approximation (LFA) for stochastic shortest path (SSP) and bound their regret over episodes. When all stationary policies are proper, our first algorithm obtains sublinear regret (), is computationally efficient, and uses stationary policies. This is the first LFA algorithm with these three properties, to the best of our knowledge. Our second algorithm improves the regret to when the feature vectors satisfy certain assumptions. Both algorithms are special cases of a more general one, which has regret for general features given access to a certain computation oracle. These algorithms and regret bounds are the first for SSP with function approximation.