CVPR2025

Reconstructing Animals and the Wild

Peter Kulits, Michael J. Black, Silvia Zuffi

Abstract

CLIP LLM Figure 1 . We train an LLM to decode a frozen CLIP embedding of a natural image into a structured, compositional scene representation encompassing explicit representations of both animals and their surrounding environment, reconstructing animals and the wild (RAW).