CVPR2025
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
Lei Fan, Dongdong Fan, Zhiguang Hu, Yiwen Ding, Donglin Di, Kai Yi, Maurice Pagnucco, Yang Song
摘要
Figure 1. Overview of MANTA. It consists of both visual and text components. The visual part includes over 137K multi-view images spanning five domains. The text part is divided into two subsets: Declarative Knowledge, comprising 875 words describing common anomalies, and Constructivist Learning, which includes 2K Image-text multiple-choice questions.