EMNLP2025

GraDaSE: Graph-Based Dataset Search with Examples

Jing He, Mingyang Lv, Qing Shi, Gong Cheng

摘要

Dataset search is a specialized information retrieval task. In the emerging scenario of Dataset Search with Examples (DSE), the user submits a query and a few target datasets that are known to be relevant as examples. The retrieved datasets are expected to be relevant to the query and also similar to the target datasets. Distinguished from existing textbased retrievers, we propose a graph-based approach GraDaSE. Besides the textual metadata of the datasets, we identify their provenancebased and topic-based relationships to construct a graph, and jointly encode their structural and textual information for ranking candidate datasets. GraDaSE outperforms a variety of strong baselines on two test collections, including DataFinder-E that we construct.