CVPR2025

MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects

Lei Fan, Dongdong Fan, Zhiguang Hu, Yiwen Ding, Donglin Di, Kai Yi, Maurice Pagnucco, Yang Song

摘要

Figure 1. Overview of MANTA. It consists of both visual and text components. The visual part includes over 137K multi-view images spanning five domains. The text part is divided into two subsets: Declarative Knowledge, comprising 875 words describing common anomalies, and Constructivist Learning, which includes 2K Image-text multiple-choice questions.