CVPR2024

Kernel Adaptive Convolution for Scene Text Detection via Distance Map Prediction

Jinzhi Zheng, Heng Fan, Libo Zhang

被引用 14 次

摘要

Segmentation-based scene text detection algorithms that are accurate to the pixel level can satisfy the detection of arbitrary shape scene text and have received widespread attention. On the one hand, due to the complexity and di-versity of the scene text, the convolution with a fixed kernel size has some limitations in extracting the visual features of the scene text. On the other hand, most of the existing segmentation-based algorithms only segment the center of the text, losing information such as the edges and directions of the text, with limited detection accuracy. There are also some improved algorithms that use iterative cor-rections or introduce other multiple information to improve text detection accuracy but at the expense of efficiency. To address these issues, this paper proposes a simple and effective scene text detection method, the Kernel Adaptive Con-volution, which is designed with a Kernel Adaptive Con-volution Module for scene text detection via predicting the distance map. Specifically, first, we design an extensible kernel adaptive convolution module (KACM) to extract vi-sual features from multiple convolutions with different ker-nel sizes in an adaptive manner. Secondly, our method pre-dicts the text distance map under the supervision of a pri-ori information (including direction map, and foreground segmentation map) and completes the text detection from the predicted distance map. Experiments on four publicly available datasets prove the effectiveness of our algorithm, in which the accuracy and efficiency of both the Total- Text and TD500 outperform the state-of-the-art algorithm. The algorithm efficiency is improved while the accuracy is com-petitive on ArT and CTW1500.