STOC2021
Efficient and near-optimal algorithms for sampling connected subgraphs
Marco Bressan
Abstract
We study the graphlet sampling problem: given an integer k ≥ 3 and a graph G=(V,E), sample a connected induced k-node subgraph of G (also called k-graphlet) uniformly at random. This is a fundamental graph mining primitive, with applications in social network analysis and bioinformatics. The two state-of-the-art techniques are random walks and color coding. The random walk is elegant, but the current upper bounds and lower bounds on its mixing time suffer a gap of Δk−1 where Δ is the maximum degree of G. Color coding is better understood, but requires a 2O(k) m-time preprocessing over the entire graph. Moreover, no efficient algorithm is known for sampling graphlets uniformly — random walks and color coding yield only є-uniform samples. In this work, we provide the following results: