NeurIPS2022

Semi-supervised Active Linear Regression

Nived Rajaraman, Devvrit, Pranjal Awasthi

被引用 1 次

摘要

Labeled data often comes at a high cost as it may require recruiting human labelers or running costly' experiments. At the same time, in many practical scenarios, one already has access to a partially labeled, potentially biased dataset that can help with the learning task at hand. Motivated by such settings, we formally initiate a study of semi-supervised active learning through the frame of linear regression. Here, the learner has access to a dataset X ∈ R (nun+nlab)×d composed of n un unlabeled examples that a learner can actively query, and n lab examples labeled a priori. Denoting the true labels by Y ∈ R nun+nlab , the learner's objective is to find while querying the labels of as few unlabeled points as possible. In this paper, we introduce an instance dependent parameter called the reduced rank, denoted R X , and propose an efficient algorithm with query complexity O(R X / ). This result directly implies improved upper bounds for two important special cases: (i) active ridge regression, and (ii) active kernel ridge regression, where the reducedrank equates to the statistical dimension, sd λ and effective dimension, d λ of the problem respectively, where λ ≥ 0 denotes the regularization parameter. Finally, we introduce a distributional version of the problem as a special case of the agnostic formulation we consider earlier; here, for every X, we prove a matching instancewise lower bound of Ω(R X / ) on the query complexity of any algorithm. * equal contribution 36th Conference on Neural Information Processing Systems (NeurIPS 2022).