SIGMOD2025

B2Mark: A Blind and Buyer-Traceable Watermarking Scheme for Tabular Datasets

Yihao Zheng, Jinfei Liu, Kui Ren, Li Xiong

Abstract

Watermarking is an effective technique to prevent illegal copying of datasets in data markets. However, existing techniques on tabular datasets either fall short in the three fundamental goals (detectability, non-intrusiveness, and robustness) or lack the capabilities of being blind and buyer-traceable. In this paper, we propose a blind and buyer-traceable watermarking scheme, B 2 Mark, based on statistical hypothesis testing. To our best knowledge, this is the first blind watermarking scheme applicable to both numerical and categorical data. During the embedding process, B 2 Mark embeds multi-bit watermark information as a buyer identifier based on the value domain (instead of noise domain) partition and a cryptographic hash function. In the detection phase, B 2 Mark employs a hypothesis testing-based approach to extract the embedded buyer identifier without accessing the original dataset, ensuring buyer-traceable and blind watermark detection. Experiments on various datasets confirm that B 2 Mark achieves three fundamental goals and can effectively and efficiently trace data leaks in multi-buyer scenarios.