NeurIPS2023
Model Spider: Learning to Rank Pre-Trained Models Efficiently
Yi-Kai Zhang, Ting-Ji Huang, Yao-Xiang Ding, De-Chuan Zhan, Han-Jia Ye
49 citations
Abstract
Figuring out which Pre-Trained Model (PTM) from a model zoo fits the target task is essential to take advantage of plentiful model resources. With the availability of numerous heterogeneous PTMs from diverse fields, efficiently selecting the most suitable PTM is challenging due to the time-consuming costs of carrying out forward or backward passes over all PTMs. In this paper, we propose MODEL SPIDER, which tokenizes both PTMs and tasks by summarizing their characteristics into vectors to enable efficient PTM selection. By leveraging the approximated performance of PTMs on a separate set of training tasks, MODEL SPIDER learns to construct tokens and measure the fitness score between a model-task pair via their tokens. The ability to rank relevant PTMs higher than others generalizes to new tasks. With the top-ranked PTM candidates, we further learn to enrich task tokens with their PTM-specific semantics to re-rank the PTMs for better selection. MODEL SPIDER balances efficiency and selection ability, making PTM selection like a spider preying on a web. MODEL SPIDER demonstrates promising performance in various configurations of model zoos. Related Works Efficient PTM Search with Transferability Assessment. Whether a selected PTM is helpful could be formulated as the problem measuring the transferability from the source data pre-training the PTM to the target downstream task [12, 33, 4, 62] . The current evaluation of transferability relies on a forward pass of the PTM on the target task, which generates the PTM-specific features on the target task. For example, NCE [73] , LEEP [53], LogME [83, 84] , PACTran [21], and TransRate [32] estimate negative conditional entropy, log expectation, marginalized likelihood, PAC-Bayesian bound, mutual information to obtain proxy metric of transferability, respectively. Several extensions including N -LEEP [45] with Gaussian mixture model on top of PTM features, H-Score [8] utilizing