NeurIPS2025

Brain-like Variational Inference

Hadi Vafaii, Dekel Galor, Jacob L. Yates

6 citations

Abstract

Inference in both brains and machines can be formalized by optimizing a shared objective: maximizing the evidence lower bound (ELBO) in machine learning, or minimizing variational free energy (F) in neuroscience (ELBO = -F). While this equivalence suggests a unifying framework, it leaves open how inference is implemented in neural systems. Here, we introduce FOND (Free energy Online Natural-gradient Dynamics), a framework that derives neural inference dynamics from three principles: (1) natural gradients on F, (2) online belief updating, and (3) iterative refinement. We apply FOND to derive iP-VAE (iterative Poisson variational autoencoder), a recurrent spiking neural network that performs variational inference through membrane potential dynamics, replacing amortized encoders with iterative inference updates. Theoretically, iP-VAE yields several desirable features such as emergent normalization via lateral competition, and hardwareefficient integer spike count representations. Empirically, iP-VAE outperforms both standard VAEs and Gaussian-based predictive coding models in sparsity, reconstruction, and biological plausibility, and scales to complex color image datasets such as CelebA. iP-VAE also exhibits strong generalization to out-of-distribution inputs, exceeding hybrid iterative-amortized VAEs. These results demonstrate how deriving inference algorithms from first principles can yield concrete architectures that are simultaneously biologically plausible and empirically effective. 1. Choice of distributions (appendix A.4): (i) approximate posterior q ϕ (z|x), (ii) prior p θ (z), and (iii) likelihood p θ (x|z) 2. Choice of inference method (appendix A.8): (i) amortized (e.g., learned neural network) vs. (ii) iterative (e.g., gradient descent) Variational Autoencoder (VAE) model family. Variational Autoencoders (VAEs) transform the abstract ELBO objective into practical deep learning architectures [18] [19] [20] . The standard Gaussian VAE (G-VAE) exemplifies this approach by assuming factorized Gaussian distributions for all three distributions, with the approximate posterior q ϕ (z|x) implemented as a neural network that maps each input x to posterior parameters: enc(x; ϕ) → (µ(x), σ 2 (x)). This amortization of inferenceusing a single network to approximate posteriors across the entire dataset-is a defining characteristic of VAEs. Alternative distribution choices are also possible; for instance, replacing both prior and posterior with Poisson distributions yields the Poisson VAE (P-VAE; [50]), which better aligns with neural spike-count statistics [51] [52] [53] . We derive the VAE loss in appendix A.5, and discuss both G-VAE and P-VAE extensively in appendices A.6 and A.7. Sparse coding and predictive coding as variational inference. Two major cornerstones of theoretical neuroscience, sparse coding (SC; [54] ) and predictive coding (PC; [22] ), can also be derived as instances of ELBO maximization (or equivalently, F minimization), given specific distributional choices [33, 55, 56] . SC and PC share two key characteristics that distinguish them from standard VAEs. First, they both use a Dirac-delta distribution for the approximate posterior, effectively collapsing it to a point estimate. But they differ in their prior assumptions: PC employs a Gaussian prior, while SC uses a sparsity-promoting prior (e.g., Laplace; Fig. 2 ). Second, instead of amortized inference as in VAEs, both PC and SC employ iterative inference, better aligning with the recurrent nature of neural computation [40] [41] [42] [43] [44] . See appendix A.8 for an in-depth comparison, and appendix A.9 for a pedagogical derivation of the Rao and Ballard [22] objective as F minimization. Online inference through natural gradient descent. In a recent landmark paper, Khan and Rue [34] proposed the Bayesian learning rule (BLR), unifying seemingly disparate learning algorithms as instances of variational inference optimized via natural gradient descent [35, 57] on a Bayesian objective which reduces to the ELBO when the likelihood is known [58, 59 ]. BLR's generality extends naturally to sequential settings [60] , where beliefs must be updated online as data arrives. In such settings, a rolling update scheme-where the posterior at each step becomes the prior for the next (Fig. 7 )-provides a simple yet effective approach to continual belief revision [60] . Summary so far. In this background section, and the corresponding appendix A, we reviewed how seemingly different models across machine learning and neuroscience can be understood as instances of F minimization (Fig. 2 ). The differences among them arise from the choices we make on the distributions and the inference methods (appendix A.10).