CVPR2023

Conditional Generation of Audio from Video via Foley Analogies

Yuexi Du, Ziyang Chen, Justin Salamon, Bryan C. Russell, Andrew Owens

Abstract

Conditional audio Generated audio Input example's audio Input silent video Conditional video Figure 1 . Conditional Foley generation via analogy. We generate a soundtrack for a silent input video, given a user-provided conditional example specifying what its audio should "sound like." In the first example, we make the xylophone strikes sound like the clicks of a mechanical keyboard. In the second, we generate a soundtrack for a video in which the drumstick striking a wooden door sounds as though it were made of metal. Notice that the shape of the sound events in the generated audio (e.g., thin stripes in the top example) matches the conditional audio and the onsets match the input example's audio. For reference, we provide the input video's (held out) sound on the right. We encourage the reader to watch and listen to the results on our project webpage.