CVPR2025

Let's Verify and Reinforce Image Generation Step by Step

Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao, Ziyu Guo, Haoquan Zhang, Manyuan Zhang, Jiaming Liu, Peng Gao, Hongsheng Li

DOI Publisher

Abstract

5 SIAT 6 CPII under InnoHK A. Related Work Scaling Test-time Computation. Humans often dedicate significant time and effort to solve complex problems. Inspired by this, many efforts have focused on scaling testtime computation for Large Language Models (LLMs) to tackle reasoning tasks such as mathematical problemsolving [? ? ? ? ], code synthesis [? ? ? ], and workflow generation [? ? ? ]. One line of research adapts the input space to leverage Chain-of-Thought (CoT) capabilities, using approaches like in-context CoT examples [? ] or zero-shot CoT prompts [? ]. Another branch modifies or integrates reasoning paths within the output space, utilizing strategies such as self-consistency [? ], CoT decoding [? ], and verifier-based selection [? ? ? ]. Among these, testtime verifiers have demonstrated generality and robustness in enhancing reasoning performance. For example, early work [? ] trains an Outcome Reward Model (ORM) to evaluate final outputs and select the best-of-N candidates for optimal results. Later, Lightman et al. [? ? ] adopt the Process Reward Model (PRM) to evaluate intermediate reasoning steps, achieving greater effectiveness. Snell et al. [? ] further highlights that scaling test-time computation is often more impactful than scaling model parameters during training. Recently, OpenAI o1 [? ] has demonstrated exceptional reasoning capabilities across a variety of complex and challenging scenarios, underscoring the potential of this approach. Building on these advancements in understanding tasks, we conduct a comprehensive investigation into whether verifier-based strategies can also enhance image generation tasks, and propose a new Potential Assessment Reward Model (PARM), specifically designed for this domain.