VLDB2021
Beyond Equi-joins: Ranking, Enumeration and Factorization
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
被引用 24 次
摘要
We study theta-joins in general and join predicates with conjunctions and disjunctions of inequalities in particular, focusing on ranked enumeration where the answers are returned incrementally in an order dictated by a given ranking function. Our approach achieves strong time and space complexity properties: with 𝑛 denoting the number of tuples in the database, we guarantee for acyclic full join queries with inequality conditions that for every value of 𝑘, the 𝑘 top-ranked answers are returned in O (𝑛 polylog 𝑛 + 𝑘 log 𝑘) time. This is within a polylogarithmic factor of O (𝑛 + 𝑘 log 𝑘), i.e., the best known complexity for equi-joins, and even of O (𝑛 + 𝑘), i.e., the time it takes to look at the input and return 𝑘 answers in any order. Our guarantees extend to join queries with selections and many types of projections (namely those called "free-connex" queries and those that use bag semantics). Remarkably, they hold even when the number of join results is 𝑛 ℓ for a join of ℓ relations. The key ingredient is a novel O (𝑛 polylog 𝑛)-size factorized representation of the query output, which is constructed on-the-fly for a given query and database. In addition to providing the first nontrivial theoretical guarantees beyond equi-joins, we show in an experimental study that our ranked-enumeration approach is also memory-efficient and fast in practice, beating the running time of state-of-the-art database systems by orders of magnitude.