EMNLP2025
BIRD: Bronze Inscription Restoration and Dating
Wenjie Hua, Hoang H. Nguyen, Gangyan Ge
Abstract
Bronze inscriptions from early China are often fragmentary, with missing or undeciphered characters and uncertain chronological assignments. To address this, we propose BIRD (Bronze Inscription Restoration and Dating), a dataset and framework that leverages pretrained language models (PLMs) tailored to the unique demands of ancient texts. By integrating domain-adaptive pretraining (DAPT) and task-adaptive pretraining (TAPT) techniques, along with a glyph net resource that links graphemes and allographs, our approach overcomes key challenges in low-resource settings and the prevalence of allography. Our results show marked improvements in both restoration and dating accuracy.