ICML2021
Householder Sketch for Accurate and Accelerated Least-Mean-Squares Solvers
Jyotikrishna Dass, Rabi N. Mahapatra
被引用 2 次
摘要
Least-Mean-Squares (LMS) solvers comprise a class of fundamental optimization problems such as linear regression, and regularized regressions such as Ridge, LASSO, and Elastic-Net. Data summarization techniques for big data generate summaries called coresets and sketches to speed up model learning under streaming and distributed settings. For example, (Maalouf et al., 2019) design a fast and accurate Caratheodory set on input data to boost the performance of existing LMS solvers. In retrospect, we explore classical Householder transformation as a candidate for sketching and accurately solving LMS problems. We find it to be a simpler, memory-efficient, and faster alternative that always existed to the above strong baseline. We also present a scalable algorithm based on the construction of distributed Householder sketches to solve LMS problem across multiple worker nodes. We perform thorough empirical analysis with large synthetic and real datasets to evaluate the performance of Householder sketch and compare with (Maalouf et al., 2019) . Our results show Householder sketch speeds up existing LMS solvers in the scikit-learn library up to 100x-400x. Also, it is 10x-100x faster than the above baseline with similar numerical stability. The distributed algorithm demonstrates linear scalability with a near-negligible communication overhead.