NeurIPS2024

The Space Complexity of Approximating Logistic Loss

Gregory Dexter, Petros Drineas, Rajiv Khanna

摘要

We provide space complexity lower bounds for data structures that approximate logistic loss up to ϵ\epsilon-relative error on a logistic regression problem with data XRn×d\mathbf{X} \in \mathbb{R}^{n \times d} and labels y{1,1}d\mathbf{y} \in \{-1,1\}^d. The space complexity of existing coreset constructions depend on a natural complexity measure μy(X)\mu_\mathbf{y}(\mathbf{X}), first defined in (Munteanu, 2018). We give an Ω~(dϵ2)\tilde{\Omega}(\frac{d}{\epsilon^2}) space complexity lower bound in the regime μy(X)=O(1)\mu_\mathbf{y}(\mathbf{X}) = O(1) that shows existing coresets are optimal in this regime up to lower order factors. We also prove a general Ω~(dμy(X))\tilde{\Omega}(d\cdot \mu_\mathbf{y}(\mathbf{X})) space lower bound when ϵ\epsilon is constant, showing that the dependency on μy(X)\mu_\mathbf{y}(\mathbf{X}) is not an artifact of mergeable coresets. Finally, we refute a prior conjecture that μy(X)\mu_\mathbf{y}(\mathbf{X}) is hard to compute by providing an efficient linear programming formulation, and we empirically compare our algorithm to prior approximate methods.