ICLR2026

A Near-Optimal Best-of-Both-Worlds Algorithm for Federated Bandits

Zicheng Hu, Zihao Wang, Cheng Chen

18 citations

Abstract

We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is Õ( √ αT ), where T is the time horizon and α is the independence number of the feedback graph. The bound against stochastic environments is O (ln T ) 2 max S∈I(G) i∈S ∆ -1 i where I (G) is the family of all independent sets in a suitably defined undirected version of the graph and ∆ i are the suboptimality gaps. The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time.