WWW2024

λGrapher: A Resource-Efficient Serverless System for GNN Serving through Graph Sharing

Haichuan Hu, Fangming Liu, Qiangyu Pei, Yongjie Yuan, Zichen Xu, Lin Wang

23 citations

Abstract

Graph Neural Networks (GNNs) have been increasingly adopted for graph analysis in web applications such as social networks. Yet, efficient GNN serving remains a critical challenge due to high workload fluctuations and intricate GNN operations. Serverless computing, thanks to its flexibility and agility, offers on-demand serving of GNN inference requests. Alas, the request-centric serverless model is still too coarse-grained to avoid resource waste. Observing the significant data locality in computation graphs of requests, we propose λGrapher, a serverless system for GNN serving that achieves resource efficiency through graph sharing and fine-grained resource allocation. "Grapher features the following designs: (1) adaptive timeout for request buffering to balance resource efficiency and inference latency, (2) graph-centric scheduling to minimize computation and memory redundancy, and (3) resource-centric function management with fine-grained resource allocation catered to the resource sensitivities of GNN operations and function orchestration optimized to hide communication latency. We implement a prototype of λGrapher based on the representative open-source serverless platform Knative and evaluate it with real-world traces from various web applications. Our results show that λGrapher can achieve an average savings of 61.5% in memory resource and 47.2% in computing resource compared with the state of the arts while ensuring GNN inference latency.