NeurIPS2025

Diversity Is All You Need for Contrastive Learning: Spectral Bounds on Gradient Magnitudes

Peter Ochieng

Abstract

We derive non-asymptotic spectral bands that bound the squared InfoNCE gradient norm via alignment, temperature, and batch spectrum, recovering the 1/τ21/\tau^{2} law and closely tracking batch-mean gradients on synthetic data and ImageNet. Using effective rank ReffR_{\mathrm{eff}} as an anisotropy proxy, we design spectrum-aware batch selection, including a fast greedy builder. On ImageNet-100, Greedy-64 cuts time-to-67.5% top-1 by 15% vs. random (24% vs. Pool--P3) at equal accuracy; CIFAR-10 shows similar gains. In-batch whitening promotes isotropy and reduces 50-step gradient variance by 1.37×1.37\times, matching our theoretical upper bound.