EMNLP2024

LUQ: Long-text Uncertainty Quantification for LLMs

Caiqi Zhang, Fangyu Liu, Marco Basaldella, Nigel Collier

11 citations

Abstract

Large Language Models (LLMs) have demonstrated remarkable capability in a variety of NLP tasks. However, LLMs are also prone to generate nonfactual content. Uncertainty Quantification (UQ) is pivotal in enhancing our understanding of a model's confidence on its generation, thereby aiding in the mitigation of nonfactual outputs. Existing research on UQ predominantly targets short text generation, typically yielding brief, word-limited responses. However, real-world applications frequently necessitate much longer responses. Our study first highlights the limitations of current UQ methods in handling long text generation. We then introduce LUQ with its two variations: LUQ-ATOMIC and LUQ-PAIR, a series of novel sampling-based UQ approaches specifically designed for long text. Our findings reveal that LUQ outperforms existing baseline methods in correlating with the model's factuality scores (negative coefficient of -0.85 observed for Gemini Pro). To further improve the factuality of LLM responses, we propose LUQ-ENSEMBLE, a method that ensembles responses from multiple models and selects the response with the lowest uncertainty. The ensembling method greatly improves the response factuality upon the best standalone LLM. 1 * Now at Google DeepMind. † Work done outside of Amazon.