KDD2023
Root Cause Analysis for Microservice Systems via Hierarchical Reinforcement Learning from Human Feedback
Lu Wang, Chaoyun Zhang, Ruomeng Ding, Yong Xu, Qihang Chen, Wentao Zou, Qingjun Chen, Meng Zhang, Xuedong Gao, Hao Fan, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang
被引用 23 次
摘要
In microservice systems, the identification of root causes of anomalies is imperative for service reliability and business impact. This process is typically divided into two phases: (i)constructing a service dependency graph that outlines the sequence and structure of system components that are invoked, and (ii) localizing the root cause components using the graph, traces, logs, and Key Performance Indicators (KPIs) such as latency. However, both phases are not straightforward due to the highly dynamic and complex nature of the system, particularly in large-scale commercial architectures like Microsoft Exchange.