VLDB2025
Elastic Index Selection for Label-Hybrid AKNN Search
Mingyu Yang, Wenxuan Xia, Wentao Li, Raymond Chi-Wing Wong, Wei Wang
Abstract
Real-world vector embeddings often carry additional label attributes, such as keywords and tags. In this context,
label-hybrid approximate k -nearest neighbor (AKNN)
search retrieves the top- k approximate nearest vectors to a query, subject to the constraint that their labels fully contain the query-label set. A naive solution builds a separate index for every query-label set, but the exponential growth of such sets makes this approach storage-prohibitive. To overcome this, we propose selectively indexing only a subset of query-label sets while still ensuring efficient processing for all queries. This is made possible by a key insight into label containment: an index built for a label set L can also serve any query whose label set L' is a superset of L , with query cost bounded by the elastic factor—the ratio between the number of vectors matching L and those matching L' . We formalize the index-selection task as a constrained optimization problem that chooses which label sets to index to satisfy space and query efficiency constraints. We prove the problem is NP-complete and propose efficient greedy algorithms for its efficiency- and space-constrained variants. Extensive experiments on real-world datasets show that our method achieves 10X-800X speedups over state-of-the-art techniques. Moreover, our approach is index-agnostic and can be seamlessly integrated into existing vector database systems.