KDD2020

Automatic Validation of Textual Attribute Values in E-commerce Catalog by Learning with Limited Labeled Data

Yaqing Wang, Yifan Ethan Xu, Xian Li, Xin Luna Dong, Jing Gao

11 citations

Abstract

Product catalogs are valuable resources for eCommerce website. In the catalog, a product is associated with multiple attributes whose values are short texts, such as product name, brand, functionality and flavor. Usually individual retailers self-report these key values, and thus the catalog information unavoidably contains noisy facts. It is very important to validate the correctness of these values in order to improve shopper experiences and enable more effective product recommendation. Due to the huge volume of products, an effective automatic validation approach is needed. In this paper, we propose to develop an automatic validation approach that verifies the correctness of textual attribute values for products. This can be formulated as a task as cross-checking a textual attribute value against product profile, which is a short textual description of the product on eCommerce website. Although existing deep neural network models have shown success in conducting cross-checking between two pieces of texts, their success has to be dependent upon a large set of quality labeled data, which are hard to obtain in this validation task: products span a variety of categories. Due to the category difference, annotation has to be done on all the categories, which is impossible to achieve in real practice.