CVPR2025

Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes

Hyeonggon Ryu, Seongyu Kim, Joon Son Chung, Arda Senocak

Abstract

DenseAV "Look at the cello. I like that cello." "Look at the cello." "I like that cello."