EMNLP2025

ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge

Chaoyue He, Xin Zhou, Yi Wu, Xinjia Yu, Yan Zhang, Lei Zhang, Di Wang, Shengfei Lyu, Hong Xu, Xiaoqiao Wang, Wei Liu, Chunyan Miao

被引用 3 次

摘要

Figure 1: The ESGenius pipeline for benchmark creation and model evaluation. The process begins with the ESGenius-Corpus, a curated collection of 231 authoritative PDFs from 7 key ESG sources. Text is extracted from these documents and used in an LLM-driven pipeline to generate candidate MCQs. These questions then undergo rigorous validation by domain experts to produce the final ESGenius-QA dataset, which contains 1136 high-quality MCQs (An example question is shown above). Finally, this dataset is used to benchmark 50 different LLMs from 5 renowned families, with sizes ranging from 0.5B to 671B parameters, via open-source models (OSS) or proprietary APIs. The evaluation is performed in two settings: a zero-shot setup to test the models' inherent ESG knowledge and a RAG setup to assess their ability to synthesize information from the source documents.