WWW2026

Accurate and Efficient Personalized Query Rewriting in Baidu Search

Xu Chu, Angela Li, Jiaming Zhang, Wei Li, Zhijie Tan, Dawei Yin, Shuaiqiang Wang, Daiting Shi

摘要

Traditional search engines return uniform results for identical queries, overlooking users' personalized intents. While personalized search has been extensively studied, research on query rewriting for personalized intents has been constrained by traditional approaches like statistical co-occurrence and synonym expansion. Current work primarily addresses multi-turn dialogue scenarios in Conversational AI rather than exploring applications in large-scale search engines. This typically stems from the absence of real search scenario data and the difficulty of inferring users' intents through personalized reasoning. While Large Language Models (LLMs) combined with Chain-of-Thought (CoT) capabilities provide possibilities for personalized reasoning, CoT introduces additional reasoning overhead that is difficult to accept in online scenarios requiring low latency. To address this, this paper proposes PicQue (Personalized Efficient Query Rewrite), a personalized query rewriting model training pipeline aimed at achieving high accuracy with low latency. PicQue contains a two-stage training that first employs a novel Hybrid Supervised Fine-Tuning strategy to retain the model's reasoning capabilities while allowing the decoding process to skip CoT, thereby obtaining CoT's accuracy gains without increasing latency. Building on this foundation, PicQue conducts second-stage reinforcement learning using Group Relative Policy Optimization (GRPO) to further improve rewriting accuracy and reduce the risk of over-rewriting. We also propose a Guided Search strategy to optimize GRPO training, alleviating the reduction in training sample utilization when all sampling rollouts are wrong. Extensive offline and online experiments demonstrate PicQue's effectiveness. In offline metrics, PicQue achieves over 7% improvement in rewriting accuracy compared to baseline methods and compresses up to 95% of decoding tokens. In online A/B tests, user satisfaction increases by 1.78% and query change rate decreases by 0.71%, achieving significant online gains.