AAAI2026

IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection

Yanhui Li, Yunkang Cao, Chengliang Liu, Yuan Xiong, Xinghui Dong, Chao Huang

12 citations

Abstract

SC-GRPO(on MVTec) SC-GRPO(on VisA) PA-SFT(on MVTec) PA-SFT(on VisA) Interpretation Perception IAD-R1 PA-SFT Initial Model IAD-R1 (ours) High quality COT dataset A1: 1.00 A2: 1.25 ……: 0.50 An: 0.90 KL SC-GRPO 𝑅 𝑐𝑜𝑛 𝑅 𝑙𝑜𝑐 𝑅 𝑡𝑦𝑝𝑒 𝑅 𝑎𝑐𝑐 Damage …… Are there any defects in this query image? Query image <think>...</think><location>top </location><type>contamination </type><answer>yes</answer> Anomaly type Expert-AD dataset IAD-R1(ours) Answer: <think>Upon examining the provided zippered fabric sample……, it is evident that deformation has occurred at one point along the central section of this section. The teeth appear misaligned compared to their usual straight alignment seen in the normal image...... making it difficult for users to smoothly operate their zippers……Therefore, further investigation into production processes……</think> <location>center</location><type>Deformation</type><answer>Yes</answer> Claude-Sonnet-4 Answer: No, I don't observe any obvious defects. Looking at this image of what appears to be a zipper, I can see the zipper teeth/coils and the fabric tape on either side. The zipper appears to be functioning normally with the teeth properly aligned and interlocked. Question: Are there any defects in the test image ? GT: Yes LlaVa-1.5 -7B LlaVa-1.6 * Corresponding author Preprint. Under review a meticulously constructed high-quality Chain-of-Thought dataset (Expert-AD) for training, enhancing anomaly perception capabilities and establishing reasoning-to-answer correlations; the Structured Control Group Relative Policy Optimization (SC-GRPO) stage employs carefully designed reward functions to achieve a capability leap from "Anomaly Perception" to "Anomaly Interpretation". Experimental results demonstrate that IAD-R1 achieves significant improvements across 7 VLMs, the largest improvement was on the DAGM dataset, with average accuracy 43.3% higher than the 0.5B baseline. Notably, the 0.5B parameter model trained with IAD-R1 surpasses commercial models including GPT-4.1 and Claude-Sonnet-4 in zero-shot settings, demonstrating the effectiveness and superiority of IAD-R1. The dataset, code, and all model weights will be publicly available at https://github.com/Yanhui-Lee/IAD-R1 .