ICML2021

Dimensionality Reduction for the Sum-of-Distances Metric

Zhili Feng, Praneeth Kacham, David P. Woodruff

被引用 12 次

摘要

We give a dimensionality reduction procedure to approximate the sum of distances of a given set of n points in R d to any "shape" that lies in a k-dimensional subspace. Here, by "shape" we mean any set of points in R d . Our algorithm takes an input in the form of an n × d matrix A, where each row of A denotes a data point, and outputs a subspace P of dimension O(k 3 /ε 6 ) such that the projections of each of the n points onto the subspace P and the distances of each of the points to the subspace P are sufficient to obtain an ε-approximation to the sum of distances to any arbitrary shape that lies in a k-dimensional subspace of R d . These include important problems such as k-median, k-subspace approximation, and (j, l) subspace clustering with j • l ≤ k. Dimensionality reduction reduces the data storage requirement to (n + d)k 3 /ε 6 from nnz(A). Here nnz(A) could potentially be as large as nd. Our algorithm runs in time nnz(A)/ε 2 + (n + d)poly(k/ε), up to logarithmic factors. For dense matrices, where nnz(A) ≈ nd, we give a faster algorithm, that runs in time nd + (n + d)poly(k/ε) up to logarithmic factors. Our dimensionality reduction algorithm can also be used to obtain poly(k/ε) size coresets for k-median and (k, 1)-subspace approximation problems in polynomial time. * col )) = (1 ± 1/6) A Ij (I -BB T )S T R -1 * col 1 .