ACL2025
Exploiting Phonetics and Glyph Representation at Radical-level for Classical Chinese Understanding
Junyi Xiang, Maofu Liu
摘要
The diachronic gap between classical and modern Chinese arises from century-scale language evolution through cumulative changes in phonological, lexical, and syntactic systems, resulting in substantial semantic variation that poses significant challenges for the computational modeling of historical texts. Current methods always enhance classical Chinese understanding of pre-trained language models through corpus pre-training or semantic integration. However, they overlook the synergistic relationship between phonetic and glyph features within Chinese characters, which is a critical factor in deciphering characters' semantics. In this paper, we propose a radical-level phonetics and glyph representation enhanced Chinese model (RPGCM) with powerful fine-grained semantic modeling capabilities. Our model establishes robust contextualized representations through: (1) rules-based radical decomposition and byte pair encoder (BPE) based radical aggregation for structural pattern recognition, (2) phonetic-glyph semantic mapping, and (3) dynamic semantic fusion. Experimental results on CCMRC, WYWEB, and C 3 Bench benchmarks demonstrate the RPGCM's superiority and validate that explicit radical-level modeling mitigates semantic variations.