ICLR2026

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing

Keming Wu, Sicong Jiang, Max Ku, Ping Nie, Minghao Liu, Wenhu Chen

被引用 31 次

摘要

Recently, we have witnessed great progress in image editing with natural language instructions. Several closed-source models like GPT-Image-1, Seedream, and Google-Nano-Banana have shown highly promising progress. However, the open-source models are still lagging. The main bottleneck is the lack of a reliable reward model to scale up high-quality synthetic training data. To address this critical bottleneck, we built EDITREWARD, trained with our new large-scale human preference dataset, meticulously annotated by trained experts following a rigorous protocol containing over 200K preference pairs. EDITREWARD demonstrates superior alignment with human preferences in instruction-guided image editing tasks. Experiments show that EDITREWARD achieves state-of-the-art human correlation on established benchmarks such as GenAI-Bench, AURORA-Bench, ImagenHub, and our new EDITREWARD-BENCH, outperforming a wide range of VLM-as-judge models. Furthermore, we use EDITREWARD to select a high-quality subset from the existing noisy ShareGPT-4o-Image dataset. We train Step1X-Edit on the selected subset, which shows significant improvement over training on the full set. This demonstrates EDITREWARD's ability to serve as a reward model to scale up high-quality training data for image editing. EDITRE-WARD with its training dataset will be released to help the community build more high-quality image editing training datasets to catch up with the frontier ones.