CVPR2025
Reconstructing Animals and the Wild
Peter Kulits, Michael J. Black, Silvia Zuffi
摘要
CLIP LLM Figure 1 . We train an LLM to decode a frozen CLIP embedding of a natural image into a structured, compositional scene representation encompassing explicit representations of both animals and their surrounding environment, reconstructing animals and the wild (RAW).