SIGMOD2025
PLForge: Enhancing Language Models for Natural Language to Procedural Extensions of SQL
Hang Zhang, Chaokun Wang, Hongwei Li, Cheng Wu, Songyao Wang, Yabin Liu, Gengyuan Shi, Ziyang Liu
Abstract
Procedural Language extensions of SQL (abbr. PL/SQL) enhance database programming by integrating procedural constructs with SQL's declarative syntax, thereby improving the reusability, modularity, and maintainability of SQL. Besides, PL/SQL in database systems presents significant challenges in real-world development, primarily due to the inherent complexity of programming. To reduce the development difficulty of PL/SQL, this paper studies the novel task of translating natural language (NL) to PL/SQL (i.e., NL-to-PL/SQL), aimed at simplifying PL/SQL development. Recent advancements in language models have shown promise in translating natural language questions into SQL queries (i.e., Text-to-SQL). However, the state-of-the-art Text-to-SQL methods focus only on single SQL queries, neglecting the procedural extensions of SQL, which limits their effectiveness for the NL-to-PL/SQL task. In this paper, we propose PLForge, a suite of pre-trained language models with parameter configurations of 3B, 7B, and 15B, tailored for NL-to-PL/SQL tasks. To enhance the PL/SQL generation capabilities of PLForge, we leverage a curated PL/SQL-centric data corpus and employ an incremental pre-training approach. Furthermore, to fully exploit the potential of PLForge, we propose a comprehensive prompt construction strategy tailored specifically for PL/SQL. Given the scarcity of NL-to-PL/SQL datasets, we develop a template-based method for generating NL-to-PL/SQL data. We conduct a series of experiments on PLForge and several baseline models. Based on execution match and exact match metrics that are designed specifically for the NL-to-PL/SQL task, the experimental results demonstrate that PLForge outperforms existing models in both in-context learning and supervised fine-tuning settings.