ICML2024
Faster Streaming and Scalable Algorithms for Finding Directed Dense Subgraphs in Large Graphs
Slobodan Mitrovic, Theodore Pan
1 citation
Abstract
Finding dense subgraphs is a fundamental algorithmic tool in data mining, community detection, and clustering. In this problem, one aims to find an induced subgraph whose edge-to-vertex ratio is maximized. We study the directed case of this question in the context of semi-streaming and massively parallel algorithms. In particular, we show that it is possible to find a approximation on randomized streams even in a single pass by using memory on -vertex graphs. Our result improves over prior works, which were designed for arbitrary-ordered streams: the algorithm by Bahmani et al. (VLDB 2012) which uses passes, and the work by Esfandiari et al. (2015) which makes one pass but uses memory. Moreover, our techniques extend to the Massively Parallel Computation model yielding rounds in the super-linear and rounds in the nearly-linear memory regime. This constitutes a quadratic improvement over state-of-the-art bounds by Bahmani et al. (VLDB 2012 and WAW 2014), which require rounds even in the super-linear memory regime. Finally, we empirically evaluate our single-pass semi-streaming algorithm on benchmarks and show that, even on non-randomly ordered streams, the quality of its output is essentially the same as that of Bahmani et al. (VLDB 2012) while it is times faster on large graphs.