ICML2020
Optimal Estimator for Unlabeled Linear Regression
Hang Zhang, Ping Li
被引用 30 次
摘要
Unlabeled linear regression, or "linear regression with an unknown permutation", has attracted increasing attentions due to its applications in (e.g.,) linkage record and de-anonymization. However, the computation of unlabeled linear regression proves to be cumbersome and existing algorithms typically require considerable time, especially in the high dimensional regime. In this paper, we propose a one-step estimator which is optimal from both the computational and the statistical aspects. From the computational perspective, our estimator exhibits the same order of computational complexity as that of the oracle case (which means the regression coefficients are known in advance and only the permutation needs recovery). From the statistical perspective, when comparing with the necessary conditions for permutation recovery, our requirement on the signal-to-noise ratio (SNR) agrees up to merely Ω (log log n) difference when the stable rank of the regression coefficients B is much less than log n/ log log n. Numerical experiments are also provided to corroborate the theoretical claims.