CVPR2023

Visual Prompt Tuning for Generative Transfer Learning

Kihyuk Sohn, Huiwen Chang, José Lezama, Luisa Polania, Han Zhang, Yuan Hao, Irfan Essa, Lu Jiang

摘要

Figure 1. Image synthesis by knowledge transfer. Unlike previous works using GANs as base model and test transfer on relatively narrow visual domains, we transfer knowledge of generative vision transformers [7, 15] to a wide range of visual domains, including natural (e.g., scene, flower), specialized (e.g., satellite, medical), and structured (e.g., road scenes, infograph, sketch) with a few training images. Notably, the prompt tuning significantly improves the prior best FID on two benchmarks ImageNet (85.9!16.3) and Places (71.3!24.2).