CVPR2024

TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process

Zhiyuan Ren, Minchul Kim, Feng Liu, Xiaoming Liu

被引用 9 次

摘要

Recently, diffusion models have emerged as a new pow-erful generative method for 3D point cloud generation tasks. However, few works study the effect of the archi-tecture of the diffusion model in the 3D point cloud, re-sorting to the typical UNet model developed for 2D images. Inspired by the wide adoption of Transformers, we study the complementary role of convolution (from UNet) and attention (from Transformers). We discover that their respective importance change according to the timestep in the diffusion process. At early stage, attention has an out-sized influence because Transformers are found to generate the overall shape more quickly, and at later stages when adding fine detail, convolution starts having a larger im-pact on the generated point cloud's local surface quality. In light of this observation, we propose a time-varying two-stream denoising model combined with convolution lay-ers and transformer blocks. We generate an optimizable mask from each timestep to reweigh global and local features, obtaining time-varying fused features. Experimen-tally, we demonstrate that our proposed method quantitatively outperforms other state-of-the-art methods regarding visual quality and diversity. Code is avaiable https://github.com/Zhiyuan-R/Tiger-Diffusion.