NeurIPS2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen, Daniel Geng, Andrew Owens
摘要
https://ificl.github.io/images-that-sound/ "bell ringing" "castle with bell towers, grayscale, lithograph style" Spectrogram Time Waveform Color image "tiger growling" "tigers, grayscale, black background" Time Frequency Figure 1: Images that sound. We use diffusion models to generate visual spectrograms (second row) that look like natural images, which we call images that sound. These spectrograms can be converted into natural sounds (third row) using a pretrained vocoder, or colorized to obtain more visually pleasing results (first row). Please refer to our website to listen to the sounds.