ICLR2026
EMFuse: Energy-based Model Fusion for Decision Making
Kejie He, Yi-Chen Li, Yang Yu
Abstract
Model fusion has emerged as a promising research direction, offering a resourceefficient paradigm that leverages existing pre-trained models to circumvent the need for training from scratch. In this work, we investigate the fusion of models specifically adapted for decision-making tasks. This challenge divides into two distinct, yet related subproblems: the direct fusion of models that act as policy and the fusion of dynamics models that subsequently induce a policy. We suggest that these seemingly divergent subproblems can be unified through the lens of energy-based models (EBMs), which parameterize a conditional distribution via an energy function where lower energy implies higher probability. Our framework, EMFuse, provides this convergence by leveraging the concept of energy as a common currency for fusion. For direct fusion of policies, such as those in language models, the output distribution is commonly softmax (Boltzmann), which essentially defines the negative logarithmic probability as an energy function. For dynamics models, existing works often train a set of models on the same dataset to obtain robust uncertainty estimation; such an ensemble approach leads to an exponential explosion in computational complexity when it comes to dynamics fusion across multiple sets of models. To overcome this, we introduce the Any-step Dynamics Energy-based Transition Model (ADETM), a novel architecture that performs efficient single-model-per-dataset uncertainty estimation with its energy-based backbone, thereby avoiding this computational explosion. Our EMFuse framework surpasses other baselines by 0.34% to 6.63% on single/cross domain discrete decision-making benchmarks, and achieved an extra 2.3 to 7.4 normalized points on average in D4RL MuJoCo continuous-control scenarios. Our code is available at https://github.com/LAMDA-RL/EMFuse . * Equal Contribution † Corresponding Author • Empirical gains: On single/cross-domain discrete decision-making benchmarks, EMFuse improves accuracy by 0.34%-6.63%, and on D4RL MuJoCo continuous control (Fu et al., 2020) it adds +2.3 to +7.4 normalized points on average over other fusion baselines. PRELIMINARIES TOWARDS ENERGY-BASED MODELS FOR DECISION MAKING Markov Decision Process We consider a discounted MDP M = (S, A, P, r, γ) with an offline dataset D = (s, a, r, s ′ ) collected by an unknown behavior policy π β (Puterman, 1994; Sutton & Barto, 2018; Levine et al., 2020) . Offline training faces a support gap: test-time states/actions may lie outside the empirical support of D, causing distribution shift and value overestimation. This makes calibrated uncertainty and support awareness central to algorithm design. Behavior modeling: explicit vs. implicit One axis models or constrains by the behavior distribution. The Explicit approaches fit an estimate πβ (a | s) and use them as a prior or regularizer for