NeurIPS2021

Optimal Best-Arm Identification Methods for Tail-Risk Measures

Shubhada Agrawal, Wouter M. Koolen, Sandeep Juneja

27 citations

Abstract

Conditional value-at-risk (CVaR) and value-at-risk (VaR) are popular tail-risk measures in finance and insurance industries as well as in highly reliable, safety-critical uncertain environments where often the underlying probability distributions are heavy-tailed. We use the multi-armed bandit best-arm identification framework and consider the problem of identifying the arm from amongst finitely many that has the smallest CVaR, VaR, or weighted sum of CVaR and mean. The latter captures the riskreturn trade-off common in finance. Our main contribution is an optimal δ-correct algorithm that acts on general arms, including heavy-tailed distributions, and matches the lower bound on the expected number of samples needed, asymptotically (as δ approaches 0). The algorithm requires solving a non-convex optimization problem in the space of probability measures, that requires delicate analysis. En-route, we develop new non-asymptotic empirical likelihood based concentration inequalities for tail-risk measures which are tighter than those for popular truncation-based empirical estimators. matches the lower bound as δ → 0, for CVaR, mean-CVaR, and VaR problems. The mean-CVaR problem is conceptually and technically similar to the CVaR problem. Hence, for simplicity of presentation, we primarily focus on the CVaR setting in the main text and give details of the mean-CVaR setting in Appendix H. We also spell out the somewhat analogous analysis for the VaR setting towards the end (Section 4.2), with details deferred to Appendix G. As is well known in the BAI MAB literature, the lower bound problem takes underlying arm distributions as inputs and solves for optimal weights that determine the proportion of samples that should be ideally allocated to each arm. The proposed algorithm uses a plugin strategy that at each sequential stage, modulo mild forced exploration, uses the generated empirical distributions as a proxy for the true distributions and arrives at weights that guide the sequential sampling strategy. Given η 1 , η 2 in P( ), let KL(η 1 , η 2 ) denote the KL-divergence between them, i.e., KL(η 1 , η 2 ) := log dη1 dη2 (y)dη 1 (y). Furthermore, let c π (η) and x π (η) denote the CVaR and VaR at the given confidence level π ∈ (0, 1), for the probability measure η (see Section 2 for definitions of these risk measures). CVaR problem: Given η ∈ P( ) and x ∈ , define functionals KL U inf : P( ) × -→ + , and KL L inf : P( ) × -→ + , where + denotes the non-negative reals, as See, [2, 31, 15] for related quantities. These projection functionals appear in the lower bound (Section 3), and are central to our plugin algorithm. Unlike in the mean case, KL U inf and KL L inf in (1) are not symmetric, and need to be studied separately. In particular, KL U inf is a convex optimization problem, while KL L inf is not. This is because c π (.) is a concave function, whence, the CVaR constraint in the KL L inf problem in (1) renders the feasible region non-convex (see Section 2). CVaR can be expressed as the optimal value of a minimization problem. This helped in re-expressing KL L inf as minimization over 2 variables, fixing one of which resulted in convex optimization over the other (see Section 3). For proving δ-correctness, we develop a new concentration inequality for weighted sums of these functionals (Proposition 4.2). Dual representations of these suggest natural candidates for supermartingales, whose mixtures help us in proving the concentration result. Similar inequalities were developed in [39, 20, 55] in different settings. See [41, Chapter 20] for an overview of the method of mixtures. We also propose KL U inf -and KL L inf -based tight anytime-valid confidence intervals for CVaR for heavy-tailed distributions, and show that our intervals typically have smaller width compared to those for the popular truncation-based empirical estimators (see Section 4.3). Since the distributions are no longer characterized by parameters, we work in the space of probability measures instead of in the Euclidean space. A key and non-trivial requirement for the proof of asymptotic optimality of the algorithm is the joint continuity of KL L inf and KL U inf in a well-chosen metric, which should generate a topology that is sufficiently fine to ensure this continuity, but coarse to ensure fast convergence of the empirical distributions to the true-arm distributions. We endow P( ) with the topology of weak convergence, or equivalently, with the Lévy metric (see Section 2 for definitions). Another nuance in our analysis is that the empirical distributions may not lie in L. This is handled by projecting them on to L under a suitable metric. Our proposed algorithm is a plugin strategy that involves solving the lower bound problem using the empirical distributions as a proxy for the actual arm distributions. This can be computationally demanding especially as the underlying samples in the empirical distribution become large. To ease the numerical b