KDD2023

PSLOG: Pretraining with Search Logs for Document Ranking

Zhan Su, Zhicheng Dou, Yujia Zhou, Ziyuan Zhao, Ji-Rong Wen

被引用 2 次

摘要

Recently, pretrained models have achieved remarkable performance not only in natural language processing but also in information retrieval (IR). Previous studies show that IR-oriented pretraining tasks can achieve better performance than only finetuning pretrained language models in IR datasets. Besides, the massive search log data obtained from mainstream search engines can be used in IR pretraining, for it contains users' implicit judgments of document relevance under a concrete query. However, existing methods mainly use direct query-document click signals to pretrain models. The potential supervision signals from search logs are far from being well explored. In this paper, we propose to comprehensively leverage four query-document relevance relations, including co-interaction and multi-hop relations, to pretrain ranking models in IR. Specifically, we focus on the user's click behavior and construct an Interaction Graph to represent the global relevance relations between queries and documents from all search logs. With the graph, we can consider the co-interaction and multi-hop q-d relationships through their neighbor nodes. Based on the relations extracted from the interaction graph, we propose four strategies to generate contrastive positive and negative q-d pairs and use these data to pretrain ranking models. Experimental results on both industrial and academic datasets demonstrate the effectiveness of our method.