CVPR2025

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Ziqi Pang, Tianyuan Zhang, Fujun Luan, Yunze Man, Hao Tan, Kai Zhang, William T. Freeman, Yu-Xiong Wang

摘要

Input Image (1×) Outpainted Region (3×) (c) Inpainting and Class-conditional Editing Objective Class: "Meerkat" (a) Random Order Generation (b) Parallel Decoding (2.5× Accelerated) Figure 1. Our RandAR enables GPT-style causal decoder-only transformers to generate images via random-order next-token prediction, which entirely removes the raster-order sequencing inductive bias of previous decoder-only models. RandAR not only (a) generates images of comparable quality, but also shows multiple zero-shot capabilities, including (b) parallel decoding for acceleration, (c) inpainting, (d) outpainting, and (e) zero-shot generalization from a 256×256 model to synthesize high-resolution images. Zoom in for image details.