NeurIPS2024

Taming Heavy-Tailed Losses in Adversarial Bandits and the Best-of-Both-Worlds Setting

Duo Cheng, Xingyu Zhou, Bo Ji

Abstract

In this paper, we study the multi-armed bandits problem in the best-of-both-worlds (BOBW) setting with heavy-tailed losses, where the losses can be negative and unbounded but have (1 + v ) -th raw moments bounded by u 1+ v for some known u > 0 and v ∈ (0 , 1] . Specifically, we consider the BOBW setting where the underlying environment can be either (oblivious) adversarial (i.e., the loss distribution can change arbitrarily over time) or stochastic (i.e., the loss distribution is fixed over time), which is unknown to the decision-maker a prior. We propose an algorithm and prove that it achieves a T 11+ v -type worst-case (pseudo-)regret in the adversarial regime and a log T -type gap-dependent regret in the stochastic regime, where T is the time horizon. Compared to the state-of-the-art results, our algorithm offers stronger high-probability regret guarantees (vs. expected regret guarantees), and more importantly, relaxes a strong technical assumption on the loss distribution, which is generally hard to verify in practice. As a byproduct, relaxing this assumption leads to the first near-optimal regret result for heavy-tailed bandits with Huber contamination in the adversarial regime (vs. the easier stochastic regime studied in all previous works). Our result also implies a high-probability BOBW regret guarantee when the bounded true losses are protected with pure Local Differential Privacy (LDP), while the existing work ensures the (weaker) approximate LDP with the regret bounds in expectation only.