ICLR2026

Temporal Geometry of Deep Networks: Hyperbolic Representations of Training Dynamics for Intrinsic Explainability

Ambarish Moharil

摘要

Intrinsic explainability remains a challenging problem, particularly in contexts where multilayer perceptrons (MLPs) require dynamic re-training within an optimization environment. This paper investigates how MLPs and their training dynamics can be represented and studied in non-Euclidean spaces; our representation features the Poincaré model of hyperbolic geometry. We aim to capture the geometric evolution of their weighted topology and self-organization over time. Instead of restricting the analysis to single checkpoints-as per established measure-based explainability methods-we construct temporal parameter graphs, i.e., snapshots over time T steps of the optimization/training process for MLPs. This reflects the view that neural networks encode information not only in their weights but also in the trajectory traced during training. Drawing on the idea that many complex networks admit embeddings in hidden metric spaces where distances correspond to connection likelihood, we present a geometric and temporal graph-based metalearning framework for obtaining dynamic hyperbolic representations of the underlying neural parameter graphs. Our model embeds temporal parameter graphs in the Poincaré model ball, and learns from them while maintaining equivariance to within-snapshot neuron permutations and invariance to permutations of past snapshots. In doing so, the approach preserves functional equivalence over time and recovers the latent evolving geometry of the network. Experiments on regression and classification tasks with trained MLPs show strong meta-network performance, accompanied by hyperbolic temporal representations. This reveals how the network structure emerges over time under specific training environments, thus providing insights into the network's self-organization. Neural Meta Networks. Neural networks can themselves be treated as data. Early meta-network approaches flattened parameters or relied on simple statistics, which ignored neuron permutation symmetry and had limited cross-architecture generalization (