CVPR2025
Scaling Properties of Diffusion Models For Perceptual Tasks
Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik
摘要
We fine-tune a pre-trained Diffusion Model (DM) for visual perception tasks. We take a RGB image, and a conditional image (i.e. next video frame, occlusion mask, etc.), along with the noised image of the ground truth prediction. Our model generates predictions for visual tasks such as depth estimation, optical flow prediction, and amodal segmentation, based on the conditional task embedding. We train a generalist model that can perform all three tasks with exceptional performance.