SIGMOD2024
Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks
Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dongmei Zhang, Surajit Chaudhuri
47 citations
Abstract
Language models, such as GPT-3 and ChatGPT, demonstrate remarkable abilities to follow diverse human instructions and perform a wide range of tasks, using instruction fine-tuning. However, when we test language models with a range of basic table-understanding tasks, we observe that today's language models are still sub-optimal in many table-related tasks, likely because they are pre-trained predominantly on one-dimensional natural-language texts, whereas relational tables are two-dimensional objects. In this work, we propose a new "fine-tuning '' paradigm, where we continue to train/fine-tune language models like GPT-3.5 and ChatGPT, using diverse table-tasks synthesized from real tables as training data, which is analogous to "instruction fine-tuning'', but with the goal of enhancing language models' ability to understand tables and perform table tasks. We show that our resulting models demonstrate: (1) better table-understanding capabilities, by consistently outperforming the vanilla GPT-3.5 and ChatGPT, on a wide range of table tasks (data transformation, data cleaning, data profiling, data imputation, table-QA, etc.), including tasks that are completely holdout and unseen during training, and (2) strong generalizability, in its ability to respond to diverse human instructions to perform new and unseen table-tasks, in a manner similar to GPT-3.5 and ChatGPT. Our code and data have been released at https://github.com/microsoft/Table-GPT for future research.