CVPR2024

Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation

Qin Guo, Tianwei Lin

摘要

InstructPix2Pix "What if she were in an anime? And put on a pair of sunglass. Then put her in a suit." "What if she were in an anime?" 最新版 "Add cherry blossoms. And make it in sunset. Then insert two sailboats." "Add cherry blossoms." InstructPix2Pix + FoI Input image InstructPix2Pix InstructPix2Pix + FoI Figure 1. Models like InstructPix2Pix (IP2P) [7] can edit images with given instruction. Yet, they face challenges like over-editing and wrong editing areas, especially with multi-instruction. Our FoI utilizes inherent grounding capability of IP2P to identify precise editing regions, then focuses on them, enabling effective editing. Notably, FoI does not require extra training or test-time optimization.