EMNLP2025
CalligraphicOCR for Chinese Calligraphy Recognition
Xiaoyi Bao, Zhongqing Wang, Jinghang Gu, Chu-Ren Huang
1 citation
Abstract
With thousand years of history, calligraphy serve as one of the representative symbols of Chinese culture. Increasing works try to digitize calligraphy by recognizing the context of calligraphy for better preservation and propagation. However, previous works stick to isolated single character recognition, not only requires unpractical manual splitting into characters, but also abandon the enriched context information that could be supplementary. To this end, we construct the pioneering end-to-end calligraphy recognition benchmark dataset, this dataset is challenging due to both the visual variations such as different writing styles, and the textual understanding such as the domain shift in semantics. We further propose CalligraphicOCR (COCR) equipped with calligraphic image augmentation and actionbased corrector targeted at the challenging root of this setting. Experiments demonstrate the advantage of our proposed model over cutting-edge baselines, underscoring the necessity of introducing this new setting, thereby facilitating a solid precondition for protecting and propagating the already scarce resources. The code and data are available at https://github.com/HoraceXIaoyiBao/ COCR-EMNLP2025 * Zhongqing Wang and Jinghang Gu are the corresponding authors (Long absent, I miss you deeply. Summer is serene, how fare you? Summoned by duty in old age, I cannot stay. A humble gift of rice conveys my regard. Take care.