NeurIPS2024

Clustering with Non-adaptive Subset Queries

Hadley Black, Euiwoong Lee, Arya Mazumdar, Barna Saha

Abstract

Recovering the underlying clustering of a set UU of nn points by asking pair-wise same-cluster queries has garnered significant interest in the last decade. Given a query SUS \subset U, S=2|S|=2, the oracle returns yes if the points are in the same cluster and no otherwise. For adaptive algorithms with pair-wise queries, the number of required queries is known to be Θ(nk)\Theta(nk), where kk is the number of clusters. However, non-adaptive schemes require Ω(n2)\Omega(n^2) queries, which matches the trivial O(n2)O(n^2) upper bound attained by querying every pair of points. To break the quadratic barrier for non-adaptive queries, we study a generalization of this problem to subset queries for S>2|S|>2, where the oracle returns the number of clusters intersecting SS. Allowing for subset queries of unbounded size, O(n)O(n) queries is possible with an adaptive scheme (Chakrabarty-Liao, 2024). However, the realm of non-adaptive algorithms is completely unknown. In this paper, we give the first non-adaptive algorithms for clustering with subset queries. Our main result is a non-adaptive algorithm making O(nlogk(logk+loglogn)2)O(n \log k \cdot (\log k + \log\log n)^2) queries, which improves to O(nloglogn)O(n \log \log n) when kk is a constant. We also consider algorithms with a restricted query size of at most ss. In this setting we prove that Ω(max(n2/s2,n))\Omega(\max(n^2/s^2,n)) queries are necessary and obtain algorithms making O~(n2k/s2)\tilde{O}(n^2k/s^2) queries for any sns \leq \sqrt{n} and O~(n2/s)\tilde{O}(n^2/s) queries for any sns \leq n. We also consider the natural special case when the clusters are balanced, obtaining non-adaptive algorithms which make O(nlogk)+O~(k)O(n \log k) + \tilde{O}(k) and O(nlog2k)O(n\log^2 k) queries. Finally, allowing two rounds of adaptivity, we give an algorithm making O(nlogk)O(n \log k) queries in the general case and O(nloglogk)O(n \log \log k) queries when the clusters are balanced.