KDD2022
OAG-BERT: Towards a Unified Backbone Language Model for Academic Knowledge Services
Xiao Liu, Da Yin, Jingnan Zheng, Xingjian Zhang, Peng Zhang, Hongxia Yang, Yuxiao Dong, Jie Tang
24 citations
Abstract
Academic Knowledge Services have substantially facilitated the development of human science and technology, providing a plenitude of useful research tools. However, many applications highly depend on ad-hoc models and expensive human labeling to understand professional contents, hindering deployments in real world. To create a unified backbone language model for various knowledge-intensive academic knowledge mining challenges, based on the world's largest public academic graph Open Academic Graph (OAG), we pre-train an academic language model, namely OAG-BERT, to integrate massive heterogeneous entity knowledge beyond scientific corpora. We develop novel pre-training strategies along with zero-shot inference techniques. OAG-BERT's superior performance on 9 knowledge-intensive academic tasks (including 2 demo applications) demonstrates its qualification to serve as a foundation for academic knowledge services. Its zero-shot capability also offers great potential to mitigate the need of costly annotations. OAG-BERT has been deployed to multiple real-world applications, such as reviewer recommendations for NSFC (National Nature Science Foundation of China) and paper tagging in the AMiner system. All codes and pre-trained models are available via the CogDL.