EMNLP2024

FinDVer: Explainable Claim Verification over Long and Hybrid-content Financial Documents

Yilun Zhao, Yitao Long, Tintin Jiang, Chengye Wang, Weiyuan Chen, Hongjun Liu, Xiangru Tang, Yiming Zhang, Chen Zhao, Arman Cohan

被引用 3 次

摘要

We introduce FINDVER, a comprehensive benchmark specifically designed to evaluate the explainable claim verification capabilities of LLMs in the context of understanding and analyzing long, hybrid-content financial documents. FINDVER contains 2,400 expertannotated examples, divided into three subsets: information extraction, numerical reasoning, and knowledge-intensive reasoning-each addressing common scenarios encountered in realworld financial contexts. We assess a broad spectrum of LLMs under long-context and RAG settings. Our results show that even the current best-performing system, GPT-4o, still lags behind human experts. We further provide in-depth analysis on long-context and RAG setting, Chain-of-Thought reasoning, and model reasoning errors, offering insights to drive future advancements. We believe that FINDVER can serve as a valuable benchmark for evaluating LLMs in claim verification over complex, expert-domain documents.