ICLR2025

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Haotian Tang, Yecheng Wu, Shang Yang, Enze Xie, Junsong Chen, Junyu Chen, Zhuoyang Zhang, Han Cai, Yao Lu, Song Han

摘要

Throughput (img/s) 7.7x higher Figure 1 : HART is an early autoregressive model that can directly generate 1024×1024 images with quality comparable to diffusion models, while offering significantly improved efficiency. It achieves 4.5-7.7× higher throughput, 3.1-5.9× lower latency (measured on A100), and 6.9-13.4× lower MACs compared to state-of-the-art diffusion models. Check out our online demo and video.