ICLR2026

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing

Keming Wu, Sicong Jiang, Max Ku, Ping Nie, Minghao Liu, Wenhu Chen

31 citations

Abstract

Recently, we have witnessed great progress in image editing with natural language instructions. Several closed-source models like GPT-Image-1, Seedream, and Google-Nano-Banana have shown highly promising progress. However, the open-source models are still lagging. The main bottleneck is the lack of a reliable reward model to scale up high-quality synthetic training data. To address this critical bottleneck, we built EDITREWARD, trained with our new large-scale human preference dataset, meticulously annotated by trained experts following a rigorous protocol containing over 200K preference pairs. EDITREWARD demonstrates superior alignment with human preferences in instruction-guided image editing tasks. Experiments show that EDITREWARD achieves state-of-the-art human correlation on established benchmarks such as GenAI-Bench, AURORA-Bench, ImagenHub, and our new EDITREWARD-BENCH, outperforming a wide range of VLM-as-judge models. Furthermore, we use EDITREWARD to select a high-quality subset from the existing noisy ShareGPT-4o-Image dataset. We train Step1X-Edit on the selected subset, which shows significant improvement over training on the full set. This demonstrates EDITREWARD's ability to serve as a reward model to scale up high-quality training data for image editing. EDITRE-WARD with its training dataset will be released to help the community build more high-quality image editing training datasets to catch up with the frontier ones.