NeurIPS2024

Embedding Dimension of Contrastive Learning and k-Nearest Neighbors

Dmitrii Avdiukhin, Vaggos Chatziafratis, Orr Fischer, Grigory Yaroslavtsev

Abstract

We study the embedding dimension of distance comparison data in two settings: contrastive learning and k -nearest neighbors ( k -NN). Our goal is to find the smallest dimension d of an ℓ p -space in which a given dataset can be represented. We show that the arboricity of the associated graphs plays a key role in designing embeddings. For the most popular ℓ 2 -space, we get tight bounds in both settings. In contrastive learning, we are given m labeled samples ( x i , y + i , z − i ) representing the fact that the positive example y i is closer to the anchor x i than the negative example z i (we also give results for t negatives). For representing such dataset in: