ICML2024

Faster Streaming and Scalable Algorithms for Finding Directed Dense Subgraphs in Large Graphs

Slobodan Mitrovic, Theodore Pan

被引用 1 次

摘要

Finding dense subgraphs is a fundamental algorithmic tool in data mining, community detection, and clustering. In this problem, one aims to find an induced subgraph whose edge-to-vertex ratio is maximized. We study the directed case of this question in the context of semi-streaming and massively parallel algorithms. In particular, we show that it is possible to find a (2+ϵ)(2+\epsilon) approximation on randomized streams even in a single pass by using O(npolylogn)O(n \cdot {\rm poly} \log n) memory on nn-vertex graphs. Our result improves over prior works, which were designed for arbitrary-ordered streams: the algorithm by Bahmani et al. (VLDB 2012) which uses O(logn)O(\log n) passes, and the work by Esfandiari et al. (2015) which makes one pass but uses O(n3/2)O(n^{3/2}) memory. Moreover, our techniques extend to the Massively Parallel Computation model yielding O(1)O(1) rounds in the super-linear and O(logn)O(\sqrt{\log n}) rounds in the nearly-linear memory regime. This constitutes a quadratic improvement over state-of-the-art bounds by Bahmani et al. (VLDB 2012 and WAW 2014), which require O(logn)O(\log n) rounds even in the super-linear memory regime. Finally, we empirically evaluate our single-pass semi-streaming algorithm on 66 benchmarks and show that, even on non-randomly ordered streams, the quality of its output is essentially the same as that of Bahmani et al. (VLDB 2012) while it is 22 times faster on large graphs.