ICLR2026

TrustGen: A Platform of Dynamic Benchmarking on the Trustworthiness of Generative Foundation Models

Yue Huang, Chujie Gao, Siyuan Wu, Haoran Wang, Xiangqi Wang, Jiayi Ye, Yujun Zhou, Yanbo Wang, Jiawen Shi, Qihui Zhang, Han Bao, Zhaoyi Liu, Yuan Li, Tianrui Guan, Peiran Wang, Haomin Zhuang, Dongping Chen, Kehan Guo, Andy Zou, Bryan Hooi, Caiming Xiong, Elias Stengel-Eskin, Hongyang Zhang, Hongzhi Yin, Huan Zhang, Huaxiu Yao, Jieyu Zhang, Jaehong Yoon, Kai Shu, Ranjay Krishna, Swabha Swayamdipta, Weijia Shi, Xiang Li, Yuexing Hao, Zhihao Jia, Zhize Li, Xiuying Chen, Zhengzhong Tu, Xiyang Hu, Tianyi Zhou, Jieyu Zhao, Lichao Sun, Furong Huang, Or Cohen-Sasson, Prasanna Sattigeri, Anka Reuel, Max Lamparth, Yue Zhao, Nouha Dziri, Yu Su, Huan Sun, Heng Ji, Chaowei Xiao, Mohit Bansal, Nitesh V Chawla, Jian Pei, Jianfeng Gao, Michael Backes, Philip S. Yu, Neil Zhenqiang Gong, Pin-Yu Chen, Bo Li, Dawn Song, Xiangliang Zhang

Abstract

Generative foundation models (GenFMs), such as large language models and text-to-image systems, have demonstrated remarkable capabilities in various downstream applications. As they are increasingly deployed in high-stakes applications, assessing their trustworthiness has become both a critical necessity and a substantial challenge. Existing evaluation efforts are fragmented, rapidly outdated, and often lack extensibility across modalities. This raises a fundamental question: how can we systematically, reliably, and continuously assess the trustworthiness of rapidly advancing GenFMs across diverse modalities and use cases? To address these gaps, we introduce TrustGen, a dynamic and modular benchmarking system designed to systematically evaluate the trustworthiness of GenFMs across text-to-image, large language, and vision-language modalities. TrustGen standardizes trust evaluation through a unified taxonomy of over 25 fine-grained dimensions—including truthfulness, safety, fairness, robustness, privacy, and machine ethics—while supporting dynamic data generation and adaptive evaluation through three core modules: Metadata Curator, Test Case Builder, and Contextual Variator. Taking TrustGen into action to evaluate the trustworthiness of 39 models reveals four key insights. (1) State-of-the-art GenFMs achieve promising overall trust performance, yet significant limitations remain in specific dimensions such as hallucination resistance, fairness, and privacy preservation. (2) Contrary to prevailing assumptions, open-source models now rival and occasionally surpass proprietary systems in trustworthiness metrics. (3) The trust gap among top-performing models is narrowing, likely due to increased industry convergence on best practices. (4) Trustworthiness is not an isolated property; it interacts complexly with other behaviors, such as helpfulness and ethical decision-making. TrustGen is a transformative step toward standardized, scalable, and actionable trustworthiness evaluation, supporting dynamic assessments across diverse modalities and trust dimensions that evolve alongside the generative AI landscape.