WWW2026
Xemis: Fair and Robust Privacy-Preserving Data Trading based on Distributed Noise Sharing
Xinxin Xing, Yizhong Liu, Ruonan Chen, Banghong Qin, Wangjie Qiu, Jianwei Liu, Qianhong Wu, Willy Susilo, Robert H. Deng
摘要
Privacy-preserving data trading allows data owners to sell data to consumers through a data trading web platform, the data market, without disclosing sensitive information in raw data. It enables legitimate data transmission and aggregation, facilitating large-scale data-driven model training. However, existing differential privacy-based approaches struggle to inject precisely calibrated noise in a trustworthy manner without revealing raw data to a third party, thus making them fail in achieving strong fairness and controllable privacy simultaneously, especially when facing malicious external adversaries or a corrupted data market. To address these issues, we present Xemis, a decentralized privacy-preserving data trading scheme, which builds upon a new multi-party computation (MPC)-based batched distributed noise sharing protocol, b-DNS. b-DNS facilitates the generation of threshold secret shares of Gaussian samples with low overhead and Byzantine robustness. Leveraging our designed game-theoretic mechanism and batched bits expansion mechanism, b-DNS achieves at least 12.7× faster and saves at least 43.6× bandwidth compared to ODO (EUROCRYPT'06) and CSU19 (CCS'19). Building atop b-DNS, Xemis further enables a decentralized, Byzantine robust data market to perturb the shares of blinded data under a controllable, precise noise level, without revealing the raw data or the perturbed data to the market. Additionally, Xemis utilizes a distributed demo dataset sampling-based mechanism and a Byzantine fault tolerance consensus-based method to enable fair value assessment and payment-data delivery. Consequently, Xemis achieves controllable privacy and strong fairness with Web3 compatibility under malicious market nodes, while achieving at least 3.3× faster than ZLM+24 (TIFS'24) with a 64-node market. The utility of noise-perturbed data is evaluated through image classification tasks on CIFAR-10, and when adding noise with privacy budget =1 to 63% of the training data, the model maintains an accuracy of 83%.