ICLR2026
Gradient-Based Diversity Optimization with Differentiable Top- Objective
Tianyi Zhou, Sebastian Dalleiger, Ece Calikus, Aristides Gionis
Abstract
Predicting relevance is a pervasive problem across digital platforms, covering social media, entertainment, and commerce. However, when optimized solely for relevance and engagement, many machine-learning models amplify data biases and produce homogeneous outputs, reinforcing filter bubbles and content uniformity. To address this issue, we introduce a pairwise top-k diversity objective with a differentiable smooth-ranking approximation, providing a model-agnostic way to incorporate diversity optimization directly into standard gradient-based learning. Building on this objective, we cast relevance and diversity as a joint optimization problem, we analyze the resulting gradient trade-offs, and propose two complementary strategies: direct optimization, which modifies the learning objective, and indirect optimization, which reweights training data. Both strategies can be applied either when training models from scratch or when fine-tuning existing relevanceoptimized models. We use recommendation as a natural evaluation setting where scalability and diversity are critical, and show through extensive experiments that our methods consistently improve diversity with negligible accuracy loss. Notably, fine-tuning with our objective is especially efficient, requiring only a few gradient steps to encode diversity at scale. Published as a conference paper at ICLR 2026 To address these challenges, we propose a unified framework that leverages differentiable ranking to optimize diversity in top-k prediction sets in a scalable and model-agnostic way. At the core, we use an effective diversity objective that can be integrated into the gradient-based training without requiring architecture changes or post-processing. Building on this objective, we introduce two diversification methods. (i) direct diversity-guided tuning (DDT), which augments the loss with a joint relevancediversity term, and (ii) meta-diversity reweighting (MDR), which preserves relevance-only training while reweighting data points using the joint loss as a meta-objective. Our approach offers a flexible alternative to post-hoc or model-specific diversification without compromising efficiency. Our contributions are threefold: (1) We propose a unified differentiable framework for optimizing relevance and diversity in top-k prediction sets, applicable to both end-to-end training and finetuning; (2) we provide a theoretical analysis of gradient conflicts, deriving feasible intervals for the trade-off parameter β and showing that an adaptive update coincides with the two-objective solution of multi-gradient descent algorithm (MGDA), guaranteeing convergence to Pareto-stationary points; (3) we empirically validate the framework on five benchmark datasets and two model architectures, demonstrating that DDT and MDR achieve substantial diversity improvements with minimal relevance loss, outperforming strong baselines. Notably, the diversity gains extend beyond the explicitly optimized top-k range, reshaping subsequent predictions as well. RELATED WORK Our work is related to diverse recommender systems and multi-objective learning. Diversity in recommender systems. Among the vast literature on recommender systems, the closest are post-hoc and learning-based diversification methods (Zhao et al., 2025) ; see the survey for a broader overview. Post-hoc methods re-rank the output of a relevance-only model to balance relevance and diversity. Representative approaches include maximal marginal relevance (MMR) (Carbonell & Goldstein, 1998) , diversity-weighted utility maximization (DUM) (Ashkan et al., 2015) , and determinantal point processes (DPP) (Chen et al., 2018) . These methods are model-agnostic and easy to implement, but their performance is limited by the quality of the initial relevance ranking, and diversity gain usually comes at a cost of reduced accuracy (Chen et al., 2017) . Learning-based approaches incorporate diversity objectives directly into training, including penalties for similarity among recommended items (Hurley, 2013; Wasilewski & Hurley, 2016) , formulations that optimize relevance-diversity trade-offs (Wang et al., 2023) list-wise, and graph-based models that encourage coverage of item categories or long-tail exposure (Zheng et al., 2021; Yang et al., 2023) . While they often outperform post-hoc re-ranking, they require architectural modifications or adversarial training, making them model-specific and computationally heavy. In contrast, our framework is differentiable and model-agnostic: it can be integrated into standard training pipelines without altering architectures or adding inference overhead. Multi-objective learning. Related is the study of multi-objective optimization for balancing goals such as accuracy, fairness, and revenue (Zheng & Wang, 2022) . Classical approaches include scalarization (Paul et al., 2022; Di Noia et al., 2017) , which reduces multiple objectives to a single weighted loss, and population-based heuristics such as ev