STOC2025

Bounded Edit Distance: Optimal Static and Dynamic Algorithms for Small Integer Weights

Egor Gorbachev, Tomasz Kociumaka

1 citation

Abstract

The edit distance (also known as the Levenshtein distance) of two strings is the minimum number of character insertions, deletions, and substitutions needed to transform one string into the other. The textbook algorithm determines the edit distance of two length-n strings in O(n2) time, and one of the foundational results of fine-grained complexity is that any polynomial-factor improvement upon this quadratic runtime would violate the Orthogonal Vectors Hypothesis. In the bounded version of the problem, where the complexity is parameterized by the value k of the edit distance, the classic algorithm of Landau and Vishkin [JCSS’88] achieves O(n+k2) time, which is optimal (up to sub-polynomial factors and conditioned on OVH) as a function of n and k. While the Levenshtein distance is a fundamental theoretical notion, most practical applications use weighted edit distance, where the weight (cost) of each edit can be an arbitrary real number in [1,∞) that may depend on the edit type and the characters involved. Unfortunately, the Landau–Vishkin algorithm does not generalize to the weighted setting and, for many decades, a simple O(nk)-time dynamic programming procedure remained the state of the art for bounded weighted edit distance. Only recently, Das, Gilbert, Hajiaghayi, Kociumaka, and Saha [STOC’23] provided an O(n+k5)-time algorithm; shortly afterward, Cassis, Kociumaka, and Wellnitz [FOCS’23] presented an Õ(n+√nk3)-time solution (where Õ(·) hides polylogn factors) and proved this runtime optimal for √n ≤ k ≤ n (up to sub-polynomial factors and conditioned on the All-Pairs Shortest Paths Hypothesis). Notably, the hard instances constructed to establish the underlying conditional lower bound use fractional weights with large denominators, reaching (n), which stands in contrast to weight functions used in practice (e.g., in bioinformatics) that, after normalization, typically attain small integer values. Our first main contribution is a surprising discovery that the Õ(n+k2) running time of the Landau–Vishkin algorithm can be recovered if the weights are small integers (e.g., if some specific weight function is fixed in a given application). In general, our solution takes Õ(n+minW,√k· k2) time for integer weights not exceeding a threshold W. Despite matching time complexities, we do not use the Landau–Vishkin framework; instead, we build upon the recent techniques for arbitrary weights. For this, we exploit extra structure following from integer weights and avoid further bottlenecks using several novel ideas to give faster and more robust implementations of multiple steps of the previous approach. Next, we shift focus to the dynamic version of the unweighted edit distance problem, which asks to maintain the edit distance of two strings that change dynamically, with each update modeled as a single edit (character insertion, deletion, or substitution). For many years, the best approach for dynamic edit distance combined the Landau–Vishkin algorithm with a dynamic strings implementation supporting efficient substring equality queries, such as one by Mehlhorn, Sundar, and Uhrig [SODA’94]; the resulting solution supports updates in Õ(k2) time. Recently, Charalampopoulos, Kociumaka, and Mozes [CPM’20] observed that a framework of Tiskin [SODA’10] yields a dynamic algorithm with an update time of Õ(n). This is optimal in terms of n: significantly faster updates would improve upon the static O(n2)-time algorithm and violate OVH. With the state-of-the-art update time at Õ(minn,k2), an exciting open question is whether Õ(k) update time is possible. Our second main contribution is an affirmative answer to this question: we present a deterministic dynamic algorithm that maintains the edit distance in Õ(k) worst-case time per update. Surprisingly, this result builds upon our new static solution and hence natively supports small integer weights: if they do not exceed a threshold W, the update time becomes Õ(W2 k).