CVPR2025

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem, Talfan Evans, Samuel Albanie, Federico Tombari, Yongqin Xian, Alessio Tonioni, Olivier J. Hénaff

Abstract

Knowledge distillation (KD) is the de facto standard for compressing large-scale multimodal models into smaller ones. Prior works have explored ever more complex KD strategies involving different objectives, teacher-ensembles, and weight inheritance. In this work, we explore an alternative, yet simple approach-active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data-and computeconfigurations. Further, we find such an active curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inferenceefficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zeroshot classification and image-text retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that ACED yields strong vision-encoders for training generative multimodal models, outperforming larger vision encoders on image-captioning and visual question-answering tasks.