ACL2025

HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval

Arian Askari, Emmanouil Stergiadis, Ilya Gusev, Moran Beladev

Abstract

We present HotelMatch-LLM, a multimodal dense retrieval model for the travel domain that enables natural language property search, addressing the limitations of traditional travel search engines which require users to start with a destination and editing search parameters. HotelMatch-LLM features three key innovations: (1) Domain-specific multi-task optimization with three novel retrieval, visual, and language modeling objectives; (2) Asymmetrical dense retrieval architecture combining a small language model (SLM) for efficient online query processing and a large language model (LLM) for embedding hotel data; and (3) Extensive image processing to handle all property image galleries. Experiments on four diverse test sets show HotelMatch-LLM significantly outperforms state-of-the-art models, including VISTA and MARVEL. Specifically, on the test set-main query type-we achieve 0.681 for HotelMatch-LLM compared to 0.603 for the most effective baseline, MAR-VEL. Our analysis highlights the impact of our multi-task optimization, the generalizability of HotelMatch-LLM across LLM architectures, and its scalability for processing large image galleries. 2 In this wreview=falseork, the term 'hotel' is used as a general reference to various types of accommodations, including but not limited to hotels, bed and breakfasts, and private homes. 3 We refer to models of 110 million parameters as SLMs and to language models ranging from 330 million to 7 billion parameters as LLMs.