NeurIPS2020
Sinkhorn Barycenter via Functional Gradient Descent
Zebang Shen, Zhenfu Wang, Alejandro Ribeiro, Hamed Hassani
被引用 10 次
摘要
In this paper, we consider the problem of computing the barycenter of a set of probability distributions under the Sinkhorn divergence. This problem has recently found applications across various domains, including graphics, learning, and vision, as it provides a meaningful mechanism to aggregate knowledge. Unlike previous approaches which directly operate in the space of probability measures, we recast the Sinkhorn barycenter problem as an instance of unconstrained functional optimization and develop a novel functional gradient descent method named Sinkhorn Descent (SD). We prove that SD converges to a stationary point at a sublinear rate, and under reasonable assumptions, we further show that it asymptotically finds a global minimizer of the Sinkhorn barycenter problem. Moreover, by providing a mean-field analysis, we show that SD preserves the weak convergence of empirical measures. Importantly, the computational complexity of SD scales linearly in the dimension d and we demonstrate its scalability by solving a 100-dimensional Sinkhorn barycenter problem. Analysis In this section, we analyze the finite time convergence and the mean field limit of SD under the following assumptions on the ground cost function c and the kernel function k of the RKHS H d . Assumption 4.1. The ground cost function c(x, y) is bounded, i.e. ∀x, y ∈ X , c(x, y) ≤ M c ; G c -Lipschitz continuous, i.e. ∀x, x , y ∈ X , |c(x, y) -c(x , y)| ≤ G c x -x ; and L c -Lipschitz smooth, i.e. ∀x, x , y ∈ X , ∇ 1 c(x, y) -∇ 1 c(x , y) ≤ L c x -x . Assumption 4.2. The kernel function k(x, y) is bounded, i.e. ∀x, y ∈ X , k(x, y) ≤ D k ; G k -Lipschitz continuous, i.e. ∀x, x , y ∈ X , |k(x, y) -k(x , y)| ≤ G c x -x .