KDD2023

Removing Camouflage and Revealing Collusion: Leveraging Gang-crime Pattern in Fraudster Detection

Lewen Wang, Haozhe Zhao, Cunguang Feng, Weiqing Liu, Congrui Huang, Marco Santoni, Manuel Cristofaro, Paola Jafrancesco, Jiang Bian

6 citations

DOI Publisher

Abstract

As one of the major threats to the healthy development of various online platforms, fraud has become increasingly committed in the form of gangs since collusive fraudulent activities are much easier to obtain illicit benefits with lower exposure risk. To detect fraudsters in a gang, spatio-temporal graph neural network models have been widely applied to detect both temporal and spatial collusive patterns. However, a closer peek into real-world records of fraudsters can reveal that fraud gangs usually conduct community-level camouflage, specified by two types, i.e., temporal and spatial camouflage. Such camouflage can disguise gangs as benign communities by concealing collusive patterns and thus deceiving many existing graph neural network models. In the meantime, many existing graph neural network models suffer from the challenge of extreme sample imbalance caused by rare fraudsters hidden among massive users. To handle all these challenges, in this paper, we propose a generative adversarial network framework, named Adversarial Camouflage Detector, to detect fraudsters. Concretely, this ACD framework consists of four modules, in charge of community division, camouflage identification, fraudster detection, and camouflage generation, respectively. The first three modules form up a discriminator that uses spatio-temporal graph neural networks as the foundation model and enhance fraudster detection by amplifying the gangs' collusive patterns through automatically identifying and removing camouflage. Meanwhile, the camouflage generation module plays as the generator role that generates fraudsters samples by competing against the discriminator to alleviate the challenge of sample imbalance and increase the model robustness. The experimental result shows that our proposed method outperforms other methods on real-world datasets.