AAAI2026
SCAN: Self-Calibrated AutoregressioN for High-Quality Visual Generation
Zhanzhou Feng, Qingpei Guo, Jingdong Chen, Feng Gao, Ming Yang, Shiliang Zhang
Abstract
Human artists can continuously refine their coarse sketches during artistic creation. This is quite different from existing autoregressive generation, where a token is determined once sampled. Aiming to flexibly refine the generated contents, this paper presents a Self-Calibrated AutoregressioN (SCAN) model capable of self-evaluating and refining generation quality without regenerating the entire image. We unify image token generation and quality evaluation into a single autoregressive model, formulating both tasks as categorical prediction problems. During inference, the model first generates a coarse initial image, then iteratively refines the lowest-quality patches until satisfactory image quality is achieved. Experimental results demonstrate that SCAN effectively handles diverse real-world generation errors and achieves a promising balance between image quality and speed. For example, SCAN-XL achieves an FID of 2.10 and an IS of 326.1, surpassing the LlamaGen-XL by 1.29 (+38%) in FID and 99.0 (+43.6%) in IS, with a 5.6× speedup (19.76s to 3.56s). Compared to recent works, SCAN improves FID and speed by +18.3% and +23% over VAR-d20, and by +7% and +46% over RandAR-XL.