NeurIPS2021

Linear and Kernel Classification in the Streaming Model: Improved Bounds for Heavy Hitters

Arvind V. Mahankali, David P. Woodruff

摘要

We study linear and kernel classification in the streaming model. For linear classification, we improve upon the algorithm of [1] , which solves the 1 point query problem on the optimal weight vector w * ∈ R d in sublinear space. We first give an algorithm solving the more difficult 2 point query problem on w * , also in sublinear space. We then give an algorithm which solves the related 2 heavy hitter problem on w * , in sublinear space and running time. Finally, we give an algorithm which can deterministically solve the 1 point query problem on w * , with sublinear space, improving upon that of [1] . For kernel classification, if w * ∈ R d p is the optimal weight vector classifying points in the stream according to their p th -degree polynomial kernel, then we give an algorithm solving the 2 point query problem on w * in poly( p log d ε ) space, and an algorithm solving the 2 heavy hitter problem in poly( p log d ε ) space and running time. Note that our space and running time is polynomial in p, making our algorithms well-suited to high-degree polynomial kernels and the Gaussian kernel (approximated by the polynomial kernel of degree p = Θ(log T )). Our algorithms for kernels are in fact a special case of a more general algorithm we give for low-rank tensor inputs. Formally, we consider the following problems in this work: * -w D * 2 as our metric, where D = 512 K. This is similar to the metric used in [1] (that is, w K -w * 2 / w K * -w * 2 ) -we use w D * instead since this omits the smaller weights and might therefore better measure how well the algorithms recover the larger weights. We also note that in place of w * , we use the weight vector that is obtained by online logistic regression for these experiments -this was also done by [1] in their experiments. Results: Here we show a few plots on the RCV1 dataset in Figure 5 , with λ = 10 -5 -all plots are in the supplementary. With a memory budget of 16 KB, when K = 120, Algorithm 2 gives an improvement of roughly 15% over the algorithm of [1] , which in turn performs better than Algorithm 1. With memory budgets of 2 KB and 4 KB, Algorithm 1 has the best performance.