NeurIPS2024
Linear Time Approximation Algorithm for Column Subset Selection with Local Search
Yuanbin Zou, Ziyun Huang, Jinhui Xu, Jianxin Wang, Qilong Feng
Abstract
The Column Subset Selection (CSS) problem has been widely studied in dimensionality reduction and feature selection. The goal of the CSS problem is to output a submatrix S , consisting of k columns from an n × d input matrix A that minimizes the residual error ∥ A − SS † A ∥ 2 F , where S † is the Moore-Penrose inverse matrix of S . Many previous approximation algorithms have non-linear running times in both n and d , while the existing linear-time algorithms have a relatively larger approximation ratios. Additionally, the local search algorithms in existing results for solving the CSS problem are heuristic. To achieve linear running time while maintaining better approximation using a local search strategy, we propose a local search-based approximation algorithm for the CSS problem with exactly k columns selected. A key challenge in achieving linear running time with the local search strategy is how to avoid exhaustive enumerations of candidate columns for constructing swap pairs in each local search step. To address this issue, we propose a two-step mixed sampling method that reduces the number of enumer-ations for swap pair construction from O ( dk ) to k in linear time. Although the two-step mixed sampling method reduces the search space of local search strategy, bounding the residual error after swaps is a non-trivial task