VLDB2025
GraphCSR: A Degree-Equalized CSR Format for Large-scale Graph Processing
Xinbiao Gan, Tiejun Li, Chunye Gong, Dongsheng Li, Dezun Dong, Jie Liu, Kai Lu
14 citations
Abstract
Graph processing underpins a vast array of data-centric applications, serving as a crucial component in fields such as social network analysis, recommendation systems, bio-informatics, and search engines. As graph data grows in scale and complexity, high-performance graph processing is increasingly essential. Many graph processing tasks depend on efficient data structures to manage the sparsity typical of real-world graphs, where most vertices have limited connectivity. This sparsity poses challenges for memory and computational efficiency in large-scale graph processing, and conventional sparse formats like Compressed Sparse Row (CSR) often struggle with memory and computation inefficiencies when handling massive graphs. To address these challenges, we introduce GraphCSR, a degree-equalized CSR format specifically tailored to enhance the spatio-temporal efficiency of distributed graph processing across various tasks. GraphCSR aggregates low-degree vertices into synthetic high-degree ones and applies group-wise compression to reduce storage overhead by recording only the starting index for each aggregated group. This reduces memory usage and supports batch-memory access to improve performance. Our extensive evaluations in various graph processing algorithms and datasets demonstrate that GraphCSR not only reduces the memory footprint required for large-scale graphs, but also improves performance across multiple types of graph processing tasks, outperforming popular sparse storage formats. Furthermore, when deployed on a production-scale supercomputer with 79,024 nodes, GraphCSR achieved a graph processing throughput that exceeded the top-ranked system on the Graph500 benchmark.