ACL2023

QueryForm: A Simple Zero-shot Form Entity Query Framework

Zifeng Wang, Zizhao Zhang, Jacob Devlin, Chen-Yu Lee, Guolong Su, Hao Zhang, Jennifer G. Dy, Vincent Perot, Tomas Pfister

4 citations

Abstract

Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities. We present a novel query-based framework, QueryForm, that extracts entity values from form-like documents in a zero-shot fashion. QueryForm contains a dual prompting mechanism that composes both the document schema and a specific entity type into a query, which is used to prompt a Transformer model to perform a single entity extraction task. Furthermore, we propose to leverage large-scale query-entity pairs generated from form-like webpages with weak HTML annotations to pre-train Query-Form. By unifying pre-training and fine-tuning into the same query-based framework, Query-Form enables models to learn from structured documents containing various entities and layouts, leading to better generalization to target document types without the need for targetspecific training data. QueryForm sets new state-of-the-art average F1 score on both the XFUND (+4.6%∼10.1%) and the Payment (+3.2%∼9.5%) zero-shot benchmark, with a smaller model size and no additional image input.