NeurIPS2024

Images that Sound: Composing Images and Sounds on a Single Canvas

Ziyang Chen, Daniel Geng, Andrew Owens

Abstract

https://ificl.github.io/images-that-sound/ "bell ringing" "castle with bell towers, grayscale, lithograph style" Spectrogram Time Waveform Color image "tiger growling" "tigers, grayscale, black background" Time Frequency Figure 1: Images that sound. We use diffusion models to generate visual spectrograms (second row) that look like natural images, which we call images that sound. These spectrograms can be converted into natural sounds (third row) using a pretrained vocoder, or colorized to obtain more visually pleasing results (first row). Please refer to our website to listen to the sounds.