ACL2024

Verifiable Generation with Subsentence-Level Fine-Grained Citations

Shuyang Cao, Lu Wang

被引用 3 次

摘要

Verifiable generation requires large language models (LLMs) to cite source documents supporting their outputs, thereby improve output transparency and trustworthiness.Yet, previous work mainly targets the generation of sentencelevel citations, lacking specificity about which part of the sentence is backed by which cited source.This work studies verifiable generation with subsentence-level fine-grained citations to locate the generated content that is supported by the cited sources in a more precise way.We first present a dataset, SCIFI, comprising 10K Wikipedia paragraphs with subsentence-level citations. 1Each paragraph in SCIFI is paired with a set of candidate source documents for citation and a query that triggers the generation of the paragraph content.On SCIFI, we then evaluate the performance of state-of-the-a rt LLMs and strategies for processing long documents designed for these models.Our experiment results reveal key factors that can enhance the quality of citations, including the expansion of the source documents' context to be accessible to the models and the implementation of specialized model tuning.What happened to Gojira's Sea Shepherd EP?