STOC2024
Almost Linear Size Edit Distance Sketch
Michal Koucký, Michael E. Saks
1 citation
Abstract
We design an almost linear-size sketching scheme for computing edit distance up to a given threshold k. The scheme consists of two algorithms, a sketching algorithm and a recovery algorithm. The sketching algorithm depends on the parameter k and takes as input a string x and a public random string ρ and computes a sketch skρ(x;k), which is a compressed version of x. The recovery algorithm is given two sketches skρ(x;k) and skρ(y;k) as well as the public random string ρ used to create the two sketches, and (with high probability) if the edit distance ED(x,y) between x and y is at most k, will output ED(x,y) together with an optimal sequence of edit operations that transforms x to y, and if ED(x,y) > k will output large. The size of the sketch output by the sketching algorithm on input x is k2O(√log(n)loglog(n)) (where n is an upper bound on length of x). The sketching and recovery algorithms both run in time polynomial in n. The dependence of sketch size on k is information theoretically optimal and improves over the quadratic dependence on k in schemes of Kociumaka, Porat and Starikovskaya (FOCS’2021), and Bhattacharya and Koucký (STOC’2023).