CVPR2024

HIVE: Harnessing Human Feedback for Instructional Visual Editing

Shu Zhang, Xinyi Yang, Yihao Feng, Can Qin, Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong, Ran Xu

DOI 出版方

摘要

Remove the red arch Add a moon in the background Add a sweater for the duck Change the plant color to blue Figure 1. We show four groups of representative results. In each triplet, from left to right are: the original image, InstructPix2Pix [7] using our data (IP2P-Ours), and HIVE. We observe that HIVE leads to more acceptable results than the model without human feedback. For instance, in the left two examples, IP2P-Ours understands the editing instruction "remove" and "change to blue" individually, but fails to understand the corresponding objects. Human feedback resolves this ambiguity, as shown in other examples as well.