KDD2025

Fair Diversity Maximization with Few Representatives

Florian Adriaens, Nikolaj Tatti

Abstract

Diversity maximization problem is a well-studied problem where the goal is to find k diverse items. Fair diversity maximization aims to select a diverse subset of k items from a large dataset, while requiring that each group of items be well represented in the output. More formally, given a set of items with labels, our goal is to find k items that maximize the minimum pairwise distance in the set, while maintaining that each label is represented within some budget. In many cases, one is only interested in selecting a handful (say a constant) number of items from each group. In such scenario we show that a randomized algorithm based on padded decompositions improves the state-of-the-art approximation ratio to √log(m) /(3m), where m is the number of labels. The algorithms work in several stages: (i) a preprocessing pruning which ensures that points with the same label are far away from each other(ii) a decomposition phase, where points are randomly placed in clusters such that there is a feasible solution with maximum one point per cluster and that any feasible solution will be diverse(iii) assignment phase, where clusters are assigned to labels, and a representative point with the corresponding label is selected from each cluster. We experimentally verify the effectiveness of our algorithm on large datasets.