ICLR2026

Flow Matching with Semidiscrete Couplings

Alireza Mousavi-Hosseini, Stephen Y. Zhang, Michal Klein, marco cuturi

被引用 6 次

摘要

Flow models parameterized as time-dependent velocity fields can generate data from noise by integrating an ODE. These models are often trained using flow matching, i.e. by sampling random pairs of noise and target points $(x_0,x_1)$ and ensuring that the velocity field is aligned, on average, with $x_1-x_0$ when evaluated along a time-indexed segment linking $x_0$ to $x_1$ . While these noise/data pairs are sampled independently by default, they can also be selected more carefully by matching batches of $n$ noise to $n$ target points using an optimal transport (OT) solver. Although promising in theory, the OT flow matching (OT-FM) approach (Pooladian et al., 2023, Tong et al., 2024) is not widely used in practice. Zhang et al. (2025), pointed out recently that OT-FM truly starts paying off when the batch size $n$ grows significantly, which only a multi-GPU implementation of the Sinkhorn algorithm can handle. Unfortunately, the pre-compute costs of running Sinkhorn can quickly balloon, requiring $O(n^2/\varepsilon^2)$ operations for every $n$ pairs used to fit the velocity field, where $\varepsilon$ is a regularization parameter that should be typically small to yield better results. To fulfill the theoretical promises of OT-FM, we propose to move away from batch-OT and rely instead on a semidiscrete formulation that can leverage the fact that the target dataset distribution is usually of finite size $N$ . The SD-OT problem is solved by estimating a dual potential vector of size $N$ using SGD; using that vector, freshly sampled noise vectors at train time can then be matched with data points at the cost of a maximum inner product search (MIPS) over the dataset. Semidiscrete FM (SD-FM) removes the quadratic dependency on $n/\varepsilon$ that bottlenecks OT-FM. SD-FM beats both FM and OT-FM on all training metrics and inference budget constraints, across multiple datasets, on unconditional/conditional generation, or when using mean-flow models.