NeurIPS2024

Hallo3D: Multi-Modal Hallucination Detection and Mitigation for Consistent 3D Content Generation

Hongbo Wang, Jie Cao, Jin Liu, Xiaoqiang Zhou, Huaibo Huang, Ran He

摘要

Recent advances in pretrained 2D diffusion models have significantly improved visual prior guidance for 3D content generation. However, this process often lacks geometric constraints, leading to spatial perception hallucinations and multi-view inconsistencies. To address this, we introduce Hallo3D , a tuning-free method for 3D content generation that leverages the geometric perception capabilities of large multi-modal models to detect and mitigate these hallucinations. Our approach follows a generation-detection-correction paradigm, using multi-modal inconsistencies as query information to guide the detection of hallucinations and formulate enhanced negative prompts that ensure consistent renderings. Additionally, we propose a denoising strategy that employs attention mechanisms to maintain consistency in color and texture across multiple views during visual guidance. Our method is data-independent, easily integrates with existing 3D content generation frameworks, and supports both text-driven and image-driven approaches. Extensive experiments demonstrate that our method significantly improves the consistency and quality of generated 3D content, particularly in mitigating hallucinations common with 2D pretrained models.