WWW2026
Kleene Closure Property Path Query Optimization Based on Node Clustered Index
Tenglong Ren, Lulu Yang, Xiaowang Zhang, Zhiyong Feng
Abstract
The evaluation of Kleene Closure Property Path Queries (KPPQs) over RDF graphs presents a fundamental trade-off. Under the intuitive simple path semantics, which forbids node repetition, the problem is NP-complete. This forces practical systems to adopt existential path semantics, which admits tractable evaluation but produces result sets bloated with redundant and often semantically meaningless paths due to unnecessary detours through cycles. This paper introduces a novel indexing framework that effectively navigates this trade-off. Our key insight is that while enumerating all simple paths is intractable, a carefully selected, compact set of representative simple paths is sufficient to correctly and completely answer connectivity queries in practice. We propose the Node Clustered Index (NCI), which pre-materializes such paths—including the longest acyclic chains and fundamental cycles —transforming online query processing from a costly graph traversal into an efficient index lookup and expansion operation. Crucially, our method guarantees that every result conforms to simple path semantics. Extensive experiments demonstrate that our approach achieves near-perfect recall, outperforming state-of-the-art systems by up to an order of magnitude. This work establishes that for NP-complete path problems, seeking polynomial-time, empirically perfect approximations is a viable and powerful paradigm, delivering both performance and quality without compromise.