EMNLP2025
GraDaSE: Graph-Based Dataset Search with Examples
Jing He, Mingyang Lv, Qing Shi, Gong Cheng
摘要
Dataset search is a specialized information retrieval task. In the emerging scenario of Dataset Search with Examples (DSE), the user submits a query and a few target datasets that are known to be relevant as examples. The retrieved datasets are expected to be relevant to the query and also similar to the target datasets. Distinguished from existing textbased retrievers, we propose a graph-based approach GraDaSE. Besides the textual metadata of the datasets, we identify their provenancebased and topic-based relationships to construct a graph, and jointly encode their structural and textual information for ranking candidate datasets. GraDaSE outperforms a variety of strong baselines on two test collections, including DataFinder-E that we construct.