SIGMOD2025
Query Optimization for Database-Returning Queries
Simon Rink, Jens Dittrich
Abstract
Recently, the novel concept of database-returning SQL queries (DRQs) was introduced. Instead of a single, (potentially) denormalized result table, DRQs return an entire subdatabase with a single SQL query. This subdatabase represents a subset of the original database, reduced to the relations, tuples, and attributes that contribute to the traditional join result. DRQs offer several benefits: they reduce network traffic in client-server settings, can lower memory requirements for materializing results, and significantly simplify querying hierarchical data. Currently, two state-of-the-art algorithms exist to compute DRQs: (1.) ResultDB Semi-Join builds upon Yannakakis' semi-join reduction algorithm by adding support for cyclic queries. (2.) ResultDB Decompose computes the standard single-table result and projects the result to the base tables to obtain the resulting subdatabase. However, multiple issues can be identified with these algorithms. First, ResultDB Semi-Join employs simple heuristics to greedily solve the underlying enumeration problems, often leading to suboptimal query plans. Second, each algorithm performs best under different conditions, so it is up to the user to choose the appropriate one for a given scenario. In this paper, we address these two issues. We propose two enumeration algorithms for ResultDB Semi-Join to handle the Root Node Enumeration Problem (RNEP) and the Tree Folding Enumeration Problem (TFEP). Further, we present a unified enumeration algorithm, TD ResultDB , to automatically decide between plans generated by our new enumeration algorithms for ResultDB Semi-Join and ResultDB Decompose . Through a comprehensive experimental evaluation, we show that the enumeration time overhead introduced by our methods remains minimal. Furthermore, by effectively solving the RNEP and TFEP, we achieve up to a 6x speed-up in query execution time for ResultDB textSemi-Join , whereas TD ResultDB consistently selects the best available plans.