SIGMOD2025
QURE: AI-Assisted and Automatically Verified UDF Inlining
Tarique Siddiqui, Arnd Christian König, Jiashen Cao, Cong Yan, Shuvendu K. Lahiri
2 citations
Abstract
User-defined functions (UDFs) extend the capabilities of SQL by improving code reusability and encapsulating complex logic, but can hinder the performance due to optimization and execution inefficiencies. Prior approaches attempt to address this by rewriting UDFs into native SQL, which is then inlined into the SQL queries that invoke them. However, these approaches are either limited to simple pattern matching or require the synthesis of complex verification conditions from procedural code, a process that is brittle and difficult to automate. This limits coverage and makes the translation approaches less extensible to previously unseen procedural constructs. In this work, we present QURE, a framework that (1) leverages large language models (LLMs) to translate UDFs to native SQL, and (2) introduces a novel formal verification method to establish equivalence between the UDF and its translation. QURE uses the semantics of SQL operators to automate the derivation of verification conditions, in turn resulting in broad coverage and high extensibility. We model a large set of imperative constructs, particularly those common in Python and Pandas UDFs, in an intermediate verification language, allowing for the verification of their SQL translation. In our empirical evaluation of Python and Pandas UDFs, equivalence is successfully verified for 88% of UDF-SQL pairs (the rest lack semantically-equivalent SQLs) and LLMs correctly translate 84% of the UDFs. Executing the translated UDFs achieves median performance improvements of 23x on single-node clusters and 12x on 12-node clusters compared to the original UDFs, while also significantly reducing out-of-memory errors.