ACL2024
DADA: Distribution-Aware Domain Adaptation of PLMs for Information Retrieval
Dohyeon Lee, Jongyoon Kim, Seung-won Hwang, Joonsuk Park
1 citation
Abstract
Pre-trained language models (PLMs) exhibit promising retrieval performance in various domains. However, they struggle in domains unseen during training, since the word distribution can shift significantly. To remedy this, GPL, a generative domain adaptation (DA) method, was proposed to generate pseudo queries and labels for documents in unseen domains to further train the retriever model. However, the pseudo queries often do not resemble real queries from the target domains, as they do not integrate the domain's distributional information. we propose Distribution-Aware Domain Adaptation (DADA) to guide the model to incorporate the term distributions at both the document-level and the corpus-level, which we refer to as observation-level and domainlevel feedback, respectively. Empirical results on five distinct datasets demonstrate that our method effectively adapts the model to target domains and expands document representation to unseen gold query terms. 1