NeurIPS2023

Geometry-Aware Adaptation for Pretrained Models

Nicholas Carl Roberts, Xintong Li, Dyah Adila, Sonia Cromp, Tzu-Heng Huang, Jitian Zhao, Frederic Sala

被引用 3 次

摘要

Machine learning models-including prominent zero-shot models-are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes-or, in the case of zero-shot prediction, to improve its performance-without any additional training. Our technique is a drop-in replacement of the standard prediction rule, swapping arg max with the Fréchet mean. We provide a comprehensive theoretical analysis for this approach, studying (i) learning-theoretic results trading off label space diameter, sample complexity, and model dimension, (ii) characterizations of the full range of scenarios in which it is possible to predict any unobserved class, and (iii) an optimal active learning-like next class selection procedure to obtain optimal training classes for when it is not possible to predict the entire range of unobserved classes. Empirically, using easily-available external metrics, our proposed approach, LOKI, gains up to 29.7% relative improvement over SimCLR on ImageNet and scales to hundreds of thousands of classes. When no such metric is available, LOKI can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP. How to best exploit relational structure remains unclear, with a number of key challenges: We might wish to know what particular subset of classes is rich enough to enable predicting many (or all) remaining labels. This is crucial in determining whether a training set is usable or, even with the aid of structure, insufficient. It is also unclear how approaches that use relational information interact with the statistical properties of learning, such as training sample complexity. Finally, performing adaptation requires an efficient and scalable algorithm. This work addresses these challenges. It proposes a simple and practical approach to learning in structured label spaces, with theoretical guarantees. First, we offer a simple way to translate the soft outputs (i.e., probability vectors) produced by any supervised learning model into a more general model that can exploit geometric information for label structure. In other words, our approach, called LOKI, 1 is a simple adaptor for pretrained models. LOKI can be applied via a fixed linear transformation of the model outputs. LOKI's simplicity makes it applicable to a broad range of settings while enabling very high-cardinality predictions subject to a potentially small model output budget-we provide a visualization of this key idea in Figure 1 . 1 Refers to the 'locus' (plural: loci) of the Fréchet mean.