VLDB2024

GTI: Graph-based Tree Index with Logarithm Updates for Nearest Neighbor Search in High-Dimensional Spaces

Ruiyao Ma, Yifan Zhu, Baihua Zheng, Lu Chen, Congcong Ge, Yunjun Gao

被引用 4 次

摘要

Nearest neighbor search (NNS) is fundamental for high-dimensional space retrieval and impacts various fields, such as pattern recognition, information retrieval, recommendation systems, and vector database management. Among existing NNS methods, graph-based methods often excel in query accuracy and efficiency. However, these methods face significant challenges, including high construction costs and difficulties with dynamic data updates. Recent efforts have focused on combining graph methods with hashing, quantization, and tree-based approaches to address these issues, but problems with large index sizes and update performance remain unresolved. In response, this paper proposes GTI, a novel, lightweight, and dynamic graph-based tree index for high-dimensional NNS. GTI constructs a tree index built across the entire dataset and employs a lightweight graph index at the level 1 of the tree to significantly reduce graph construction costs. It also features effective data insertion and deletion algorithms that enable logarithmic real-time updates. Additionally, we have developed an effective NNS algorithm for GTI, which not only achieves approximate search performance on par with SOTA graph-based methods but also supports exact NNS. Extensive experiments on six real-world datasets demonstrate that GTI achieves an approximately 10× improvement in update efficiency compared to SOTA tree-based methods, while achieving search effectiveness comparable to SOTA approximate NNS methods. These results underscore the potential of GTI for effective application in dynamic and evolving scenarios.