EMNLP2023
ToolWriter: Question Specific Tool Synthesis for Tabular Data
Carlos Gemmell, Jeff Dalton
被引用 2 次
摘要
Tabular question answering (TQA) presents a challenging setting for neural systems by requiring joint reasoning of natural language with large amounts of semi-structured data. Unlike humans who use programmatic tools like filters to transform data before processing, language models in TQA process tables directly, resulting in information loss as table size increases. In this paper we propose Tool-Writer to generate query specific programs and detect when to apply them to transform tables and align them with the TQA model's capabilities. Focusing ToolWriter to generate rowfiltering tools improves the state-of-the-art for WikiTableQuestions and WikiSQL with the most performance gained on long tables. By investigating headroom, our work highlights the broader potential for programmatic tools combined with neural components to manipulate large amounts of structured data. Question: How many monarchs died before the age of 35? Deceased James I aged 68 Alfonso I aged 27 Sancho aged 28 James II aged 69 295 rows x 5 columns TQA Model James III aged 34 Alfonso I aged 27 Sancho aged 28 >>> table.apply(lambda row: float(row['Deceased'] .split('aged')[1]) < 35