WWW2026
PAOSC: Plug-and-play Attention Optimization for Semantic Consistency in LLMs
Chang Li, Yawei Liu, Chun Long, Jing Zhao, Guanyao Du
Abstract
Attention mechanisms are essential to the success of Large Language Models (LLMs). In practice, models often overemphasize semantically low-value tokens, forming attention sinks while failing to capture truly informative tokens. Existing inference-time optimization methods mainly rely on static adjustments or attention redistribution, which often disrupt the correspondence between attention distribution and the actual semantics of the input, leading to a loss of semantic consistency and degraded performance. To address this problem, we propose PAOSC, a plug-and-play attention optimization model designed to maintain semantic consistency by dynamically adjusting attention. PAOSC employs a generator to identify informative tokens and a discriminator to optimize the generator via policy gradients based on confidence changes and loss fluctuations. Experiments on eight LLMs show up to a 9.68% improvement in the F1 score. On the constructed HTTP-RL dataset, PAOSC eliminates 18% of low-value tokens, improving inference efficiency while maintaining semantic consistency. Our code is available at https://github.com/ChangLi000/PAOSC.