KDD2022

Design Domain Specific Neural Network via Symbolic Testing

Hui Li, Xing Fu, Ruofan Wu, Jinyu Xu, Kai Xiao, Xiaofu Chang, Weiqiang Wang, Shuai Chen, Leilei Shi, Tao Xiong, Yuan Qi

2 citations

Abstract

Deep sequence networks such as multi-head self-attention networks provide a promising way to extract effective representations from raw sequence data in an end-to-end fashion and have shown great success in various domains such as natural language processing, computer vision, $etc$ . However, in domains such as financial risk management and anti-fraud where expert-derived features are heavily relied on, deep sequence models struggle to dominate the game.In this paper, we introduce a simple framework called symbolic testing to verify the learnability of certain expert-derived features over sequence data. A systematic investigation over simulated data reveals the fact that the self-attention architecture fails to learn some standard symbolic expressions like the count distinct operation. To overcome this deficiency, we propose a novel architecture named SHORING, which contains two components:event network andsequence network. Theevent network efficiently learns arbitrary high-orderevent-level conditional embeddings via a reparameterization trick while thesequence network integrates domain-specific aggregations into the sequence-level representation, thereby providing richer inductive biases compare to standard sequence architectures like self-attention. We conduct comprehensive experiments and ablation studies on synthetic datasets that mimic sequence data commonly seen in anti-fraud domain and three real-world datasets. The results show that SHORING learns commonly used symbolic features well, and experimentally outperforms the state-of-the-art methods by a significant margin over real-world online transaction datasets. The symbolic testing framework and SHORING have been applied in anti-fraud model development at Alipay and improved performance of models for real-time fraud-detection.