ASE2025
Hypergraph Neural Network-based Multi-Granular Root Cause Localization for Microservice Systems
Yaxiao Li, Lu Wang, Chenxi Zhang, Qingshan Li, Siming Rong, Baiyang Wen, Xuyang Li, Kun Ma, Quanwei Du, KeYang Li, Lingfeng Pan, Xinyue Li, Mingxuan Hui
Abstract
Modern enterprises are increasingly adopting microservice architectures to enhance system flexibility and scalability. However, in the face of ever-changing business requirements, the relationships between system components have become increasingly complex, resulting in significant challenges in maintaining system robustness. In recent years, multimodal data-driven approaches based on graph neural networks have emerged as a predominant solution for root cause localization in microservice systems. Our detailed analysis of architectural characteristics and existing research reveals two critical limitations. First, simple graph is insufficient to represent the one-to-many relationships inherent in microservice component interactions, such as deployment, subordinate, and dependency. Second, the current multimodal data-based method has difficulty in performing localization on faults occurring on hosts, services, and instances at the same time.To address these challenges, we propose HyperRCA, a novel multi-granular root cause analysis approach based on hypergraph neural networks. Our approach models system states during faults via a hypergraph with instances as graph nodes, explicitly capturing heterogeneous relationships through three innovative hyperedge designs: deployment hyperedges for infrastructure relationships, subordinate hyperedges for service hierarchies, and dependency hyperedges for inter-component interactions. We used hypergraph neural networks and multi-layer perceptrons to train a root cause localization model based on hyperedge features to achieve multi-granularity root cause localization. Experimental evaluations demonstrate significant performance improvements over state-of-the-art approaches. HyperRCA achieves a maximum HR@5 improvement of 112.62% on single-granularity datasets and 466.43% in multi-granularity scenarios.