NDSS2018

De-anonymization of Mobility Trajectories: Dissecting the Gaps between Theory and Practice

Huandong Wang, Chen Gao, Yong Li, Gang Wang, Depeng Jin, Jingbo Sun

39 citations

Abstract

Human mobility trajectories are increasingly collected by ISPs to assist academic research and commercial applications. Meanwhile, there is a growing concern that individual trajectories can be de-anonymized when the data is shared, using information from external sources (e.g. online social networks). To understand this risk, prior works either estimate the theoretical privacy bound or simulate de-anonymization attacks on synthetically created (small) datasets. However, it is not clear how well the theoretical estimations are preserved in practice. In this paper, we collected a large-scale ground-truth trajectory dataset from 2,161,500 users of a cellular network, and two matched external trajectory datasets from a large social network (56,683 users) and a check-in/review service (45,790 users) on the same user population. The two sets of large ground-truth data provide a rare opportunity to extensively evaluate a variety of de-anonymization algorithms (7 in total). We find that their performance in the real-world dataset is far from the theoretical bound. Further analysis shows that most algorithms have underestimated the impact of spatio-temporal mismatches between the data from different sources, and the high sparsity of user generated data also contributes to the underperformance. Based on these insights, we propose 4 new algorithms that are specially designed to tolerate spatial or temporal mismatches (or both) and model user behavior. Extensive evaluations show that our algorithms achieve more than 17% performance gain over the best existing algorithms, confirming our insights.