SIGMOD2025

Smallest Synthetic Witnesses for Conjunctive Queries

Aryan Esmailpour, Boris Glavic, Xiao Hu, Stavros Sintos

被引用 1 次

摘要

Given a self-join-free conjunctive query Q and a set of tuples S , a synthetic witness D is a database instance such that the result of Q on D is S . In this work, we are interested in two problems. First, the existence problem ESW decides whether any synthetic witness D exists. Second, given that a synthetic witness exists, the minimization problem SSW computes a synthetic witness of minimal size. The SSW problem is related to the smallest witness problem recently studied by Hu and Sintos [22]; however, the objective and the results are inherently different. More specifically, we show that SSW is poly-time solvable for a wider range of queries. Interestingly, in some cases, SSW is related to optimization problems in other domains, such as the role mining problem in data mining and the edge concentration problem in graph drawing. Solutions to ESW and SSW are of practical interest, e.g., for test database generation for applications accessing a database and for data compression by encoding a dataset S as a pair of a query Q and database D . We prove that ESW is in P, presenting a simple algorithm that, given any S , decides whether a synthetic witness exists in polynomial time in the size of S . Next, we focus on the SSW problem. We show an algorithm that computes a minimal synthetic witness in polynomial time with respect to the size of S for any query Q that has the head-domination property. If Q does not have such a property, then SSW is generally hard. More specifically, we show that for the class of path queries (of any constant length), SSW cannot be solved in polynomial time unless P = NP. We then extend this hardness result to the class of Berge-acyclic queries that do not have the head-domination property, obtaining a full dichotomy of SSW for Berge-acyclic queries. Finally, we investigate the hardness of SSW beyond Berge-acyclic queries by showing that SSW cannot be solved in polynomial time for some cyclic queries unless P = NP.