ICLR2025
A Stochastic Approach to the Subset Selection Problem via Mirror Descent
Dan Greenstein, Elazar Gershuni, Ilan Ben-Bassat, Yaroslav Fyodorov, Ran Moshe, Fiana Raiber, Alex Shtoff, Oren Somekh, Nadav Hallak
Abstract
The subset selection problem is fundamental in machine learning and other fields of computer science. We introduce a stochastic formulation for the minimum cost subset selection problem in a black box setting, in which only the subset metric value is available. Subsequently, we can handle two-stage schemes, with an outer subset-selection component and an inner subset cost evaluation component. We propose formulating the subset selection problem in a stochastic manner by choosing subsets at random from a distribution whose parameters are learned. Two stochastic formulations are proposed. The first explicitly restricts the subset's cardinality, and the second yields the desired cardinality in expectation. The distribution is parameterized by a decision variable, which we optimize using Stochastic Mirror Descent. Our choice of distributions yields constructive closedform unbiased stochastic gradient formulas and convergence guarantees, including a rate with favorable dependency on the problem parameters. Empirical evaluation of selecting a subset of layers in transfer learning complements our theoretical findings and demonstrates the potential benefits of our approach. Central in many fields of science, such as machine learning and theoretical computer science, the subset selection problem is NP-hard (see Bar-Yehuda and Even (1981) ). In this paper, we propose Research partly conducted during Dan and Elazar's internships at Yahoo Research. Research conducted while Yaroslav, Ran, Fiana, Alex and Oren worked at Yahoo Research.