WWW2026

Xemis: Fair and Robust Privacy-Preserving Data Trading based on Distributed Noise Sharing

Xinxin Xing, Yizhong Liu, Ruonan Chen, Banghong Qin, Wangjie Qiu, Jianwei Liu, Qianhong Wu, Willy Susilo, Robert H. Deng

DOI 出版方

摘要

Privacy-preserving data trading allows data owners to sell data to consumers through a data trading web platform, the data market, without disclosing sensitive information in raw data. It enables legitimate data transmission and aggregation, facilitating large-scale data-driven model training. However, existing differential privacy-based approaches struggle to inject precisely calibrated noise in a trustworthy manner without revealing raw data to a third party, thus making them fail in achieving strong fairness and controllable privacy simultaneously, especially when facing malicious external adversaries or a corrupted data market. To address these issues, we present Xemis, a decentralized privacy-preserving data trading scheme, which builds upon a new multi-party computation (MPC)-based batched distributed noise sharing protocol, b-DNS. b-DNS facilitates the generation of threshold secret shares of Gaussian samples with low overhead and Byzantine robustness. Leveraging our designed game-theoretic mechanism and batched bits expansion mechanism, b-DNS achieves at least 12.7× faster and saves at least 43.6× bandwidth compared to ODO (EUROCRYPT'06) and CSU19 (CCS'19). Building atop b-DNS, Xemis further enables a decentralized, Byzantine robust data market to perturb the shares of blinded data under a controllable, precise noise level, without revealing the raw data or the perturbed data to the market. Additionally, Xemis utilizes a distributed demo dataset sampling-based mechanism and a Byzantine fault tolerance consensus-based method to enable fair value assessment and payment-data delivery. Consequently, Xemis achieves controllable privacy and strong fairness with Web3 compatibility under malicious market nodes, while achieving at least 3.3× faster than ZLM+24 (TIFS'24) with a 64-node market. The utility of noise-perturbed data is evaluated through image classification tasks on CIFAR-10, and when adding noise with privacy budget =1 to 63% of the training data, the model maintains an accuracy of 83%.