ICLR2023
RandProx: Primal-Dual Optimization Algorithms with Randomized Proximal Updates
Laurent Condat, Peter Richtárik
1 citation
Abstract
Proximal splitting algorithms are well suited to solving large-scale nonsmooth optimization problems, in particular those arising in machine learning. We propose a new primal-dual algorithm, in which the dual update is randomized; equivalently, the proximity operator of one of the function in the problem is replaced by a stochastic oracle. For instance, some randomly chosen dual variables, instead of all, are updated at each iteration. Or, the proximity operator of a function is called with some small probability only. A nonsmooth variance-reduction technique is implemented so that the algorithm finds an exact minimizer of the general problem involving smooth and nonsmooth functions, possibly composed with linear operators. We derive linear convergence results in presence of strong convexity; these results are new even in the deterministic case, when our algorithms reverts to the recently proposed Primal-Dual Davis-Yin algorithm. Some randomized algorithms of the literature are also recovered as particular cases (e.g., Point-SAGA). But our randomization technique is general and encompasses many unbiased mechanisms beyond sampling and probabilistic updates, including compression. Since the convergence speed depends on the slowest among the primal and dual contraction mechanisms, the iteration complexity might remain the same when randomness is used. On the other hand, the computation complexity can be significantly reduced. Overall, randomness helps getting faster algorithms. This has long been known for stochastic-gradient-type algorithms, and our work shows that this fully applies in the more general primal-dual setting as well. 1 2 x ′ -x 2 . This operator has a closed form for many functions of practical interest (Parikh & Boyd, 2014; Pustelnik & Condat, 2017; Gheche et al., 2018) , see also the website http://proximity-operator.net . In addition, the Moreau identity holds: where φ * : x ∈ X → sup x ′ ∈X x, x ′ -φ(x ′ ) denotes the conjugate function of φ (Bauschke & Combettes, 2017). Thus, one can compute the proximity operator of φ from the one of φ * , and conversely.