VLDB2025

OmniMatch: Joinability Discovery in Data Products

Christos Koutras, Jiani Zhang, Xiao Qin, Chuan Lei, Vassilis N. Ioannidis, Christos Faloutsos, George Karypis, Asterios Katsifodimos

3 citations

Abstract

We propose OmniMatch , a novel joinability discovery technique, specifically tailored for the needs of data products : cohesive curated collections of tabular datasets. OmniMatch combines multiple column-pair similarity measures leveraging self-supervised Graph Neural Networks (GNNs). OmniMatch 's GNN captures column relatedness by leveraging graph neighborhood information, significantly improving the recall of joinability discovery tasks. At the same time, OmniMatch increases its precision by augmenting its training data with negative column join examples through an automated negative example generation process. Compared to the state-of-the-art, OmniMatch exhibits up to 14% higher effectiveness in F1 score and AUC without relying on individual, user-provided thresholds for each similarity metric.