KDD2023
Learning Balanced Tree Indexes for Large-Scale Vector Retrieval
Wuchao Li, Chao Feng, Defu Lian, Yuxin Xie, Haifeng Liu, Yong Ge, Enhong Chen
被引用 10 次
摘要
Vector retrieval focuses on finding the k-nearest neighbors from a bunch of data points, and is widely used in a diverse set of areas such as information retrieval and recommender system. The current state-of-the-art methods represented by HNSW usually generate indexes with a big memory footprint, restricting the scale of data they can handle, except resorting to a hybrid index with external storage. The space-partitioning learned indexes, which only occupy a small memory, have made great breakthroughs in recent years. However, these methods rely on a large amount of labeled data for supervised learning, so model complexity affects the generalization.