NeurIPS2023
DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries
Joshua Engels, Benjamin Coleman, Vihan Lakshman, Anshumali Shrivastava
20 citations
Abstract
We study the problem of vector set search with vector set queries. This task is analogous to traditional near-neighbor search, with the exception that both the query and each element in the collection are sets of vectors. We identify this problem as a core subroutine for semantic search applications and find that existing solutions are unacceptably slow. Towards this end, we present a new approximate search algorithm, DESSERT (DESSERT Effeciently Searches Sets of Embeddings via Retrieval Tables). DESSERT is a general tool with strong theoretical guarantees and excellent empirical performance. When we integrate DESSERT into ColBERT, a state-of-the-art semantic search model, we find a 2-5x speedup on the MS MARCO and LoTTE retrieval benchmarks with minimal loss in recall, underscoring the effectiveness and practical applicability of our proposal. 1. We develop the first non-trivial algorithm, DESSERT, for the vector set search problem that scales to large collections (n > 10 6 ) of sets with m > 3 items. 2. We formalize the vector set search problem in a rigorous theoretical framework, and we provide strong guarantees for a common (and difficult) instantiation of the problem.