ICML2022
Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and more
Elad Tolochinsky, Ibrahim Jubran, Dan Feldman
19 citations
Abstract
Coreset (or core-set) in this paper is a small weighted subset of the input set with respect to a given monotonic function that provably approximates its fitting loss to any given . Using we can obtain approximation of that minimizes this loss, by running existing optimization algorithms on . We provide: (I) a lower bound that proves that there are sets with no coresets smaller than , (II) a proof that a small coreset of size near-logarithmic in exists for any input , under natural assumption that holds e.g. for logistic regression and the sigmoid activation function. (III) a generic algorithm that computes in expected time, (IV) extensive experimental results with open code and benchmarks that show that the coresets are even smaller in practice. Existing papers (e.g.[Huggins,Campbell,Broderick 2016]) suggested only specific coresets for specific input sets.