NeurIPS2023

Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards

Bo Xue, Yimu Wang, Yuanyu Wan, Jinfeng Yi, Lijun Zhang

被引用 12 次

摘要

This paper investigates the problem of generalized linear bandits with heavy-tailed rewards, whose (1+ϵ)(1+\epsilon)-th moment is bounded for some ϵ(0,1]\epsilon\in (0,1]. Although there exist methods for generalized linear bandits, most of them focus on bounded or sub-Gaussian rewards and are not well-suited for many real-world scenarios, such as financial markets and web-advertising. To address this issue, we propose two novel algorithms based on truncation and mean of medians. These algorithms achieve an almost optimal regret bound of O~(dT11+ϵ)\widetilde{O}(dT^{\frac{1}{1+\epsilon}}), where dd is the dimension of contextual information and TT is the time horizon. Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches. Additionally, our mean-of-medians-based algorithm requires only O(logT)O(\log T) rewards and one estimator per epoch, making it more practical. Moreover, our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when ϵ=1\epsilon=1. Numerical experimental results confirm the merits of our algorithms.