ICLR2026

Offline Reinforcement Learning with Adaptive Feature Fusion

Tieru Wang, Kunbao Wu, Guoshun Nan

2 citations

Abstract

Return-conditioned supervised learning (RCSL) algorithms have demonstrated strong generative capabilities in offline reinforcement learning (RL) by learning action distributions based on both the state and the return. However, existing approaches treat RL as a conditional sequence modeling task, where actions are predicted from historical context and a target return. This leads to a critical flaw: the policy can overfit to the specific, often suboptimal, actions found within those historical contexts. Consequently, even when conditioned on a high target return, the model struggles to synthesize a correspondingly highquality action sequence, which fundamentally limits its ability to perform effective trajectory stitching and outperform the behavioral policy. To address these limitations, we propose a novel approach, the Q-Augmented Dual-Feature Fusion Decision Transformer (QDFFDT). Our key innovation is a learnable fusion mechanism that explicitly separates and then adaptively combines global, history-aware sequence features with local, immediate Markovian features. This introduces a structural bias that prioritizes single-step dynamics while still leveraging long-term context, improving generalization without the need for extensive hyperparameter tuning. Experimental results on the D4RL benchmark show that QDFFDT outperforms current state-of-the-art methods, demonstrating the power of adaptive feature fusion for robust offline RL. Our code is available at https://github.com/wangtieru2/QDFFDT .