VLDB2026

Towards Efficient Random-Order Enumeration for Join Queries

Pengyu Chen, Zizheng Guo, Jianwei Yang, Dongjing Miao

Abstract

In many data analysis pipelines, a basic and time-consuming process is to produce join results and feed them into downstream tasks. Numerous enumeration algorithms have been developed for this purpose. To be a statistically meaningful representation of the whole join result, the result tuples are required to be enumerated in uniformly random order. However, existing studies lack an efficient random-order enumeration algorithm with a worst-case runtime guarantee for (cyclic) join queries. In this paper, we study the problem of enumerating the results of a join query in random order. We develop an efficient random-order enumeration algorithm for join queries with no large hidden constants in its complexity, achieving expected O(AGM(Q)Res(Q)log2Q)O(\frac{\mathrm{AGM}(Q)}{|Res(Q)|}\log^2|Q|) delay, O(AGM(Q)logQ)O(\mathrm{AGM}(Q)\log|Q|) total running time after O(QlogQ)O(|Q|\log|Q|)-time index construction, where Q|Q| is the size of input, AGM(Q)\mathrm{AGM}(Q) is the AGM bound, and Res(Q)|Res(Q)| is the size of the join result. We prove that our algorithm is near-optimal in the worst case, under the combinatorial kk-clique hypothesis. Our algorithm requires no query-specific preprocessing and can be flexibly adapted to many common database indexes with only minor modifications. We also devise two non-trivial techniques to speed up the enumeration, and provide an experimental study on our enumeration algorithm along with the speed-up techniques. The experimental results show that our algorithm, enhanced with the proposed techniques, significantly outperforms existing state-of-the-art methods.