CVPR2024

Evaluating Transferability in Retrieval Tasks: An Approach Using MMD and Kernel Methods

Mengyu Dai, Amir Hossein Raffiee, Aashish Jain, Joshua Correa

摘要

Retrieval tasks play central roles in real-world machine learning systems such as search engine, recommender system, and retrieval-augmented generation (RAG). Achieving decent performance in these tasks often requires fine-tuning various pretrained models on specific datasets and selecting the best candidate, a process that can be both time and resource consuming. To tackle the problem, we introduce a novel and efficient method, called RetMMD, that leverages Maximum Mean Discrepancy (MMD) and kernel methods to assess the transferability of pretrained models in retrieval tasks. RetMMD is calculated on pretrained model and target dataset without any fine-tuning involved. Specifically, given some query, we quantify the distribution discrepancy between relevant and irrelevant document embeddings, by estimating the similarities within their mappings in the finetuned embedding space through kernel method. This discrepancy is averaged over multiple queries, taking into account the distribution characteristics of the target dataset. Experiments suggest that the proposed metric calculated on pretrained models closely aligns with retrieval performance post fine-tuning. The observation holds across a variety of datasets, including image, text, and multi-modal domains, indicating the potential of using MMD and kernel methods for transfer learning evaluation in retrieval scenarios. In addition, we also design a way of evaluating dataset transferability for retrieval tasks, with experimental results demonstrating the effectiveness of the proposed approach.