NeurIPS2021

KS-GNN: Keywords Search over Incomplete Graphs via Graphs Neural Network

Yu Hao, Xin Cao, Yufan Sheng, Yixiang Fang, Wei Wang

22 citations

Abstract

Keyword search is a fundamental task to retrieve information that is the most relevant to the query keywords. Keyword search over graphs aims to find subtrees or subgraphs containing all query keywords ranked according to some criteria. Existing studies all assume that the graphs have complete information. However, real-world graphs may contain some missing information (such as edges or keywords), thus making the problem much more challenging. To solve the problem of keyword search over incomplete graphs, we propose a novel model named KS-GNN based on the graph neural network and the auto-encoder. By considering the latent relationships and the frequency of different keywords, the proposed KS-GNN aims to alleviate the effect of missing information and is able to learn low-dimensional representative node embeddings that preserve both graph structure and keyword features. Our model can effectively answer keyword search queries with linear time complexity over incomplete graphs. The experiments on four real-world datasets show that our model consistently achieves better performance than state-of-the-art baseline methods in graphs having missing information. * Corresponding author. 35th Conference on Neural Information Processing Systems (NeurIPS 2021). Query ๐‘ž๐‘ž= c, e, f 1 2 3 4 5 a, d, e b, c a, b c, f a, e d, f 6 (a) Graph ๐บ๐บ (b) Incomplete Graph ๐บ๐บ๐บ Answers = ๐‘ฃ๐‘ฃ 1 , ๐‘ฃ๐‘ฃ 2 , ๐‘ฃ๐‘ฃ 6 1 2 3 4 5 2 Related Work Keyword Search in Graphs. Keyword search over graph data aims to find the top-k subtrees or subgraphs according to some ranking criteria. The conventional methods design algorithms assuming that the graphs have complete information. For example, DBXplorer [13] proposes to utilize the number of the answer's edges as the scoring function. BANKS [14] model tuples as nodes in a graph and then performs keyword search using proximity-based ranking. He et al. [2] propose a general ranking function considering both graph structure and content. BLINKS also builds an efficient bi-level index structure to improve efficiency. Kargar and An, motivated by the Steiner tree problem,