NeurIPS2021

Differentially Private Multi-Armed Bandits in the Shuffle Model

Jay Tenenbaum, Haim Kaplan, Yishay Mansour, Uri Stemmer

被引用 35 次

摘要

We give an (ε,δ)(\varepsilon,\delta)-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model with a distribution-dependent regret of O((a[k]:Δa>0logTΔa)+klog1δlogTε)O\left(\left(\sum_{a\in [k]:\Delta_a>0}\frac{\log T}{\Delta_a}\right)+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\varepsilon}\right), and a distribution-independent regret of O(kTlogT+klog1δlogTε)O\left(\sqrt{kT\log T}+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\varepsilon}\right), where TT is the number of rounds, Δa\Delta_a is the suboptimality gap of the arm aa, and kk is the total number of arms. Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.