ACL2023

CoMave: Contrastive Pre-training with Multi-scale Masking for Attribute Value Extraction

Xinnan Guo, Wentao Deng, Yongrui Chen, Yang Li, Mengdi Zhou, Guilin Qi, Tianxing Wu, Dong Yang, Liubin Wang, Yong Pan

被引用 1 次

摘要

Attribute Value Extraction (AVE) aims to automatically obtain attribute value pairs from product descriptions to aid e-commerce. Despite the progressive performance of existing approaches in e-commerce platforms, they still suffer from two challenges: 1) difficulty in identifying values at different scales simultaneously; 2) easy confusion by some highly similar fine-grained attributes. This paper proposes a pre-training technique for AVE to address these issues. In particular, we first improve the conventional token-level masking strategy, guiding the language model to understand multi-scale values by recovering spans at the phrase and sentence level. Second, we apply clustering to build a challenging negative set for each example and design a pre-training objective based on contrastive learning to force the model to discriminate similar attributes. Comprehensive experiments show that our solution provides a significant improvement over traditional pretrained models in the AVE task, and achieves state-of-the-art on four benchmarks 1 .