VLDB2025
RapidStore: An Efficient Dynamic Graph Storage System for Concurrent Queries
Chiyu Hao, Jixian Su, Shixuan Sun, Hao Zhang, Sen Gao, Jianwen Zhao, Chenyi Zhang, Jieru Zhao, Chen Chen, Minyi Guo
被引用 2 次
摘要
Dynamic graph storage systems are essential for real-time applications such as social networks and recommendation, where graph data continuously evolves. However, they face significant challenges in efficiently handling concurrent read and write operations. We find that existing methods suffer from write queries interfering with read efficiency, substantial time and space overhead due to peredge versioning, and an inability to balance performance, such as slow searches under concurrent workloads. To address these issues, we propose RapidStore, a holistic approach for efficient in-memory dynamic graph storage designed for read-intensive workloads. Our key idea is to exploit the characteristics of graph queries through a decoupled system design that separates the management of read and write queries and decouples version data from graph data. Particularly, we design an efficient dynamic graph store to cooperate with the graph concurrency control mechanism. Experimental results demonstrate that RapidStore enables fast and scalable concurrent graph queries, effectively balancing the performance of inserts, searches, and scans, and significantly improving efficiency in dynamic graph storage systems. We focus on directed graphs, denoted as 𝐺 = (𝑉 , 𝐸), where 𝑉 represents the set of vertices and 𝐸 ⊆ 𝑉 × 𝑉 denotes the set of directed edges. An edge 𝑒 (𝑢, 𝑣) ∈ 𝐸 indicates a directed connection from 𝑢 to 𝑣. For any vertex 𝑢 ∈ 𝑉 , we define 𝑁 (𝑢) as the neighborhood of 𝑢, which includes all vertices 𝑣 such that 𝑒 (𝑢, 𝑣) ∈ 𝐸. The out-degree of vertex 𝑢, denoted as 𝑑 (𝑢), is given by |𝑁 (𝑢)|. In contrast, an undirected graph can be represented by storing edges in both directions, i.e., 𝑒 (𝑢, 𝑣) and 𝑒 (𝑣, 𝑢). Each vertex is assigned a unique integer ID within the range [0, |𝑉 |). Previous studies [1, 16, 18, 43, 45] have shown that using integer vertex IDs within this range offers significant computational and storage advantages due to their compact representation. Therefore, existing works [8, 10, 15, 25, 49] either require vertex IDs to fall within [0, |𝑉 |) or utilize dictionary encoding techniques to map external vertex IDs to this range. In this paper, we assume that each vertex ID 𝑢 ∈ 𝑉 is an integer in [0, |𝑉 |). An ID is an 4-bytes integer in the implementation. A dynamic graph 𝐺 = (𝐺 0 , ΔG) represents the evolution of a graph over time, where 𝐺 0 is the initial graph and ΔG is a sequence of updates. Each update Δ𝐺 𝑖 = (⊕, 𝑒) modifies the graph by either inserting or deleting 𝑒 (𝑢, 𝑣), with ⊕ = +/indicating the operation. By applying updates from Δ𝐺 0 to Δ𝐺 𝑖 , we obtain the graph 𝐺 𝑖 . Dynamic graph storage systems are designed to efficiently support both read and write operations: Search(u, v), which finds vertex 𝑣 in 𝑁 (𝑢), Scan(u), which traverses 𝑁 (𝑢), and Insert(u, v) and Delete(u, v), which add or remove a neighbor from 𝑁 (𝑢) (i.e., adding or removing an edge 𝑒 (𝑢, 𝑣)). Since graph applications are typically read-intensive, this paper focuses on scenarios with small updates and heavy read operations, consistent with prior works [8, 15, 49] . Nevertheless, RapidStore also efficiently supports batch updates involving tens of thousands of edges. Adaptive Radix Tree. The adaptive radix tree (ART) [22] is a highperformance data structure optimized for fast and memory-efficient key storage and retrieval. It builds on the radix tree by indexing keys through byte sequences. In a typical radix tree, each node may hold up to 256 child pointers (one for each byte value), which enables rapid lookups but can waste memory when many pointers are unused. ART overcomes this inefficiency by dynamically adapting the size of its nodes to the number of children. It defines four node types: N4, N16, N48, and N256, supporting up to 4, 16, 48, and 256 child pointers, respectively. Nodes automatically grow or shrink as keys are inserted or deleted, optimizing memory usage. Furthermore, ART employs path compression to reduce tree depth by compressing common prefixes shared by multiple keys into a single edge or node. Lazy expansion further collapses nodes by removing the path to single leaf. These two techniques reduce the number of nodes, speeding up searches. For 𝑛 elements in ART, each with a byte length of 𝑤, the time complexity for Search, Insert, and Delete operations is 𝑂 (𝑤), while Scan takes 𝑂 (𝑛). The space complexity is 𝑂 (𝑛).