CVPR2025
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
Jiazhi Guan, Kaisiyuan Wang, Zhiliang Xu, Quanwei Yang, Yasheng Sun, Shengyi He, Borong Liang, Yukang Cao, Yingying Li, Haocheng Feng, Errui Ding, Jingdong Wang, Youjian Zhao, Hang Zhou, Ziwei Liu
摘要
Figure 1 . Zero-Shot Results by AudCast. Our method generates lifelike human videos with a realistic style, conditioned on any reference subject and driving audio, in various resolutions. The synthesized videos exhibit natural, rhythmic motion and expressive expressions, with fine details in both face and hands.