ICLR2026

Perturbed Dynamic Time Warping: A Probabilistic Framework and Generalized Variants

Xiangqian Sun, Chaoqun Wang, Wei Zhang

Abstract

Dynamic Time Warping (DTW) is a classical method for measuring similarity between time series, but its non-differentiability hinders integration into end-to-end learning frameworks. To address this, soft-DTW replaces the minimum operator with a smooth soft-min, enabling differentiability and efficient computation. Motivated by soft-DTW, we propose perturbed-DTW, a differentiable framework of DTW obtained by adding random perturbations to warping costs and taking the expected minimum. Under Gumbel noise, perturbed-DTW exactly recovers soft-DTW, providing a natural probabilistic interpretation of soft-DTW. We further generalize this framework by extending the Gumbel noise to the broader family of generalized extreme value (GEV) distributions, leading to a new class of soft-DTW variants. Building on this insight, we introduce nested-soft-DTW (ns-DTW), which integrates GEV perturbations into the dynamic programming formulation of perturbed-DTW. This extension induces alignments with tunable skewness, offering greater flexibility in modeling diverse alignment structures. We validate ns-DTW on barycenter computation, clustering, and classification, demonstrating its effectiveness over existing approaches. * corresponding author However, directly computing DTW via (1) is intractable due to the exponential size of A m,n . Instead, DTW can be computed efficiently via dynamic programming (DP). To formulate the DP, define D i,j as the minimal cost of alignment from time series [x 1 , . . . , x i ] and [y 1 , . . . , y j ]. Then, D i,j satisfies the Bellman equation (Bellman, 1952) , ( ) with boundary conditions D i,1 = i k=1 C k,1 , ∀i ∈ [m], and D 1,j = j k=1 C 1,k , ∀j ∈ [n]. Then DTW distance is then given by D m,n , i.e., DTW(C) = D m,n . Therefore, the optimal alignment