ICML2024

Measures of diversity and space-filling designs for categorical data

Cédric Malherbe, Emilio Domínguez-Sánchez, Merwan Barlier, Igor Colin, Haitham Bou-Ammar, Tom Diethe

Abstract

Problem How to measure the diversity of discrete sequences (e.g. biological and text data)? How to be create balanced training sets for such data? Goal Design efficient algorithms to provide diverse sets of discrete sequences and provide algorithms to measure their diversity Approach Relies on combinatorial optimization and greedy algorithms to create approximate algorithms