ICCV2021

CrackFormer: Transformer Network for Fine-Grained Crack Detection

Huajun Liu, Xiangyu Miao, Christoph Mertz, Chengzhong Xu, Hui Kong

195 citations

Abstract

Cracks are irregular line structures that are of interest in many computer vision applications. Crack detection (e.g., from pavement images) is a challenging task due to intensity in-homogeneity, topology complexity, low contrast and noisy background. The overall crack detection accuracy can be significantly affected by the detection performance on fine-grained cracks. In this work, we propose a Crack Transformer network (CrackFormer) for fine-grained crack detection. The CrackFormer is composed of novel attention modules in a SegNet-like encoder-decoder architecture. Specifically, it consists of novel self-attention modules with 1x1 convolutional kernels for efficient contextual information extraction across feature-channels, and efficient positional embedding to capture large receptive field contextual information for long range interactions. It also introduces new scaling-attention modules to combine outputs from the corresponding encoder and decoder blocks to suppress nonsemantic features and sharpen semantic ones. The Crack-Former is trained and evaluated on three classical crack datasets. The experimental results show that the Crack-Former achieves the Optimal Dataset Scale (ODS) values of 0.871, 0.877 and 0.881, respectively, on the three datasets and outperforms the state-of-the-art methods.