NeurIPS2022

Randomized Sketches for Clustering: Fast and Optimal Kernel kk-Means

Rong Yin, Yong Liu, Weiping Wang, Dan Meng

6 citations

Abstract

Kernel k -means is arguably one of the most common approaches to clustering. In this paper, we investigate the efficiency of kernel k -means combined with randomized sketches in terms of both statistical analysis and computational requirements. More precisely, we propose a unified randomized sketches framework to kernel k -means and investigate its excess risk bounds, obtaining the state-of-the-art risk bound with only a fraction of computations. Indeed, we prove that it suffices to choose the sketch dimension Ω( √ n ) to obtain the same accuracy of exact kernel k -means with greatly reducing the computational costs, for sub-Gaussian sketches, the randomized orthogonal system (ROS) sketches, and Nyström kernel k -means, where n is the number of samples. To the best of our knowledge, this is the first result of this kind for unsupervised learning. Finally, the numerical experiments on simulated data and real-world datasets validate our theoretical analysis.