NeurIPS2024

Looks Too Good To Be True: An Information-Theoretic Analysis of Hallucinations in Generative Restoration Models

Regev Cohen, Idan Kligvasser, Ehud Rivlin, Daniel Freedman

Abstract

The pursuit of high perceptual quality in image restoration has driven the development of revolutionary generative models, capable of producing results often visually indistinguishable from real data. However, as their perceptual quality continues to improve, these models also exhibit a growing tendency to generate hallucinations -realistic-looking details that do not exist in the ground truth images. Hallucinations in these models create uncertainty about their reliability, raising major concerns about their practical application. This paper investigates this phenomenon through the lens of information theory, revealing a fundamental tradeoff between uncertainty and perception. We rigorously analyze the relationship between these two factors, proving that the global minimal uncertainty in generative models grows in tandem with perception. In particular, we define the inherent uncertainty of the restoration problem and show that attaining perfect perceptual quality entails at least twice this uncertainty. Additionally, we establish a relation between distortion, uncertainty and perception, through which we prove the aforementioned uncertainly-perception tradeoff induces the well-known perception-distortion tradeoff. We demonstrate our theoretical findings through experiments with super-resolution and inpainting algorithms. This work uncovers fundamental limitations of generative models in achieving both high perceptual quality and reliable predictions for image restoration. Thus, we aim to raise awareness among practitioners about this inherent tradeoff, empowering them to make informed decisions and potentially prioritize safety over perceptual performance. 1. We introduce a definition for the inherent uncertainty U Inherent of an inverse problem, and formulate the uncertainty-perception (UP) function, seeking the minimal attainable uncertainty for a given perceptual index. We prove the UP function is globally lower-bounded by U Inherent (Theorem 1). 2. We prove a fundamental trade-off between uncertainty and perception under any underlying data distribution, restoration problem or model (Theorem 1). Specifically, the entropy power of the recovery error exhibits a lower bound inversely related to the Rényi divergence between the true and recovered image distributions (Theorem 3). This shows that perfect perceptual quality requires at least twice the inherent uncertainty U Inherent . 3. We establish a relationship between uncertainty and mean squared error (MSE) distortion, demonstrating that the uncertainty-perception trade-off induces the well-known distortion-perception trade-off [14] (Theorem 4). 4. We empirically validate all theoretical findings through experiments on image super-resolution and inpainting (Section 5), covering a broad spectrum of recovery algorithms, diverse metrics and data distributions. Our experimental results for image inpainting are illustrated in Figure 2 . We aim to provide practitioners with a deeper understanding of the tradeoff between uncertainty and perceptual quality, allowing them to strategically navigate this balance and prioritize safety when deploying generative models in real-world, sensitive applications. Related Work Recent work in image restoration has made significant strides in both perceptual quality assessment and uncertainty quantification, largely independently. Below, we outline the main trends in research on these topics, laying the foundation for our framework. Perception Quantification Perceptual quality in restoration tasks encompasses how humans perceive the output, considering visual fidelity, similarity to the original, and absence of artifacts. While traditional metrics like PSNR and SSIM [82] capture basic similarity, they miss finer details and higher-level structures. Learned metrics like LPIPS [87], VGG-loss [72], and DISTS [22] offer improvements but still operate on pixel or patch level, potentially overlooking holistic aspects. Recently, researchers have leveraged image-level embeddings from large vision models like DINO [17] and CLIP [62] to capture high-level similarity. Further advancements include HyperIQA [74] that leverages self-adaptive hyper networks to blindly assess image quality in the wild, while LIQE [88] and QAlign [84] utilize large language models to capture high-level semantic similarity and alignment between the restored and original images. Here, we follow previous works [58, 14, 31] and adopt a mathematical notion of perceptual quality defined as the divergence between probability densities. Uncertainty Quantification Uncertainty quantification techniques can be broadly categorized into two main paradigms: Bayesian estimation and frequentist approaches. The Bayesian paradigm defines uncertainty by assuming a distribution over the model parameters and/or activation functions [1]. The most prevalent approach is Bayesian neural networks [52, 78, 34], which are stochastic models trained using Bayesian inference. To improve ef