NeurIPS2023

Optimal Rates for Bandit Nonstochastic Control

Y. Jennifer Sun, Stephen H. Newman, Elad Hazan

7 citations

Abstract

Linear Quadratic Regulator (LQR) and Linear Quadratic Gaussian (LQG) control are foundational and extensively researched problems in optimal control. We investigate LQR and LQG problems with semi-adversarial perturbations and timevarying adversarial bandit loss functions. The best-known sublinear regret algorithm of Gradu et al. [2020] has a T 3 4 time horizon dependence, and the authors posed an open question about whether a tight rate of √ T could be achieved. We answer in the affirmative, giving an algorithm for bandit LQR and LQG which attains optimal regret (up to logarithmic factors) for both known and unknown systems. A central component of our method is a new scheme for bandit convex optimization with memory, which is of independent interest. 1 The LQR/LQG dynamics can be generalized to time-varying linear dynamical systems. Here we restrict ourselves to linear time-invariant systems for simplicity. 37th Conference on Neural Information Processing Systems (NeurIPS 2023).