CVPR2025

PreciseCam: Precise Camera Control for Text-to-Image Generation

Edurne Bernal-Berdun, Ana Serrano, Belén Masiá, Matheus Gadelha, Yannick Hold-Geoffroy, Xin Sun, Diego Gutierrez

Abstract

A photograph of an inviting reading room with a large armchair, a low wooden table, and floor-to-ceiling bookshelves packed with books, softly lit by a standing lamp. An impressionist painting of a serene lakeside at dawn. Soft brushstrokes blend the colors of the sky, water, and distant mountains. The lake reflects the pastel hues of the rising sun, and small details of trees and boats blur into the overall mood of tranquility. Pitch Figure 1. Our approach enhances the artistic expression of text-to-image generative models by incorporating precise control over camera angles and lens distortion effects. Left: Our input consists of a standard text prompt along with extrinsic (roll and pitch) and intrinsic (vertical field of view and distortion ξ) camera parameters, which are translated into a suitable and efficient representation for learning camera views. Right: Examples varying roll (top) and pitch (bottom) with the same prompt, while keeping the remaining camera parameters fixed.