ICLR2025
Long-Short Decision Transformer: Bridging Global and Local Dependencies for Generalized Decision-Making
Jincheng Wang, Penny Karanasou, Pengyuan Wei, Elia Gatti, Diego Martínez Plasencia, Dimitrios Kanoulas
摘要
Decision Transformers (DTs) effectively capture long-range dependencies using self-attention but struggle with fine-grained local relationships, especially the Markovian properties in many offline-RL datasets. Conversely, Decision ConvFormer (DC) utilizes convolutional filters for capturing local patterns but shows limitations in tasks demanding long-term dependencies, such as Maze2d. To address these limitations and leverage both strengths, we propose the Long-Short Decision Transformer (LSDT), a general-purpose architecture to effectively capture global and local dependencies across two specialized parallel branches (self-attention and convolution). We explore how these branches complement each other by modeling various ranged dependencies across different environments, and compare it against other baselines. Experimental results demonstrate our LSDT achieves state-of-the-art performance and notable gains over the standard DT in D4RL offline RL benchmark. Leveraging the parallel architecture, LSDT performs consistently on diverse datasets, including Markovian and non-Markovian. We also demonstrate the flexibility of LSDT's architecture, where its specialized branches can be replaced or integrated into models like DC to improve performance in capturing diverse dependencies. Finally, we also highlight the role of goal states in improving decision-making for goal-reaching tasks like Antmaze. We conduct evaluations of our proposed approaches on the standard D4RL benchmark (Fu et al., 2020) . Results from our studies demonstrate consistent enhancement and superior performance of our LSDT in decision-making compared to DT and its variants. LSDT achieves comparable performance compared to state-of-the-art RL methods. Additionally, the goal-state conditioning method significantly enhances the performance of transformer-based RL, as shown in Antmaze and Maze2d tasks. To validate the flexibility of LSDT, we replaced the Dynamic Conv in the short-term branch with DC's filters, and the observed improvements (e.g., Figure 2 ) on non-Markovian datasets confirm that our approach is straightforward and plug-and-play with minimal adjustments. PRELIMINARIES Offline Reinforcement Learning. In RL, the learning environment can be considered as a Markov Decision Process (MDP), defined by tuples (S, A, P, R). Here, S represents the set of possible states, A represents the set of actions, P represents the state transition probabilities P (s ′ |s, a), and R