NeurIPS2023

DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries

Joshua Engels, Benjamin Coleman, Vihan Lakshman, Anshumali Shrivastava

被引用 20 次

摘要

We study the problem of vector set search with vector set queries. This task is analogous to traditional near-neighbor search, with the exception that both the query and each element in the collection are sets of vectors. We identify this problem as a core subroutine for semantic search applications and find that existing solutions are unacceptably slow. Towards this end, we present a new approximate search algorithm, DESSERT (DESSERT Effeciently Searches Sets of Embeddings via Retrieval Tables). DESSERT is a general tool with strong theoretical guarantees and excellent empirical performance. When we integrate DESSERT into ColBERT, a state-of-the-art semantic search model, we find a 2-5x speedup on the MS MARCO and LoTTE retrieval benchmarks with minimal loss in recall, underscoring the effectiveness and practical applicability of our proposal. 1. We develop the first non-trivial algorithm, DESSERT, for the vector set search problem that scales to large collections (n > 10 6 ) of sets with m > 3 items. 2. We formalize the vector set search problem in a rigorous theoretical framework, and we provide strong guarantees for a common (and difficult) instantiation of the problem.