NeurIPS2023

Knowledge Distillation for High Dimensional Search Index

Zepu Lu, Jin Chen, Defu Lian, Zaixi Zhang, Yong Ge, Enhong Chen

被引用 9 次

摘要

Lightweight compressed indexes are prevalent in Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) owing to their superiority of retrieval efficiency in large-scale datasets. However, results given by compressed indexes are less accurate due to the curse of dimension and limitation of optimization objectives (e.g., lacking interactions between queries and documents). Thus, we are encouraged to design a new learning algorithm for the compressed search index in high dimensions to improve retrieval performance. In this paper, we propose a novel K nowledge D istillation for high dimensional search index framework ( KDindex ), with the aim of efficiently learning lightweight indexes by distilling knowledge from high-precision ANNS and MIPS models such as graph-based indexes. Specifically, the student is guided to keep the same ranking order of the top-k relevant results yielded by the teacher model, which acts as the additional supervision signals between queries and documents to learn the similarities between documents. Furthermore, to avoid the trivial solutions that all candidates are partitioned to the same post list, the reconstruction loss that minimizes the compressed error, and the posting list balance strategy that equally allocates the candidates, are integrated into the learning objective. Experiment re-sults demonstrate that KDindex outperforms existing learnable quantization-based indexes and is 40× lighter than the state-of-the-art non-exhaustive methods while achieving comparable recall quality.