ICSE2025

Famos: Fault Diagnosis for Microservice Systems Through Effective Multi-Modal Data Fusion

Chiming Duan, Yong Yang, Tong Jia, Guiyang Liu, Jinbu Liu, Huxing Zhang, Qi Zhou, Ying Li, Gang Huang

3 citations

Abstract

Accurately diagnosing the fault that causes the failure is crucial for maintaining the reliability of a microservice system after a failure occurs. Mainstream fault diagnosis approaches are data-driven and mainly rely on three modalities of runtime data: traces, logs, and metrics. Diagnosing faults with multiple modalities of data in microservice systems has been a clear trend in recent years because different types of faults and corresponding failures tend to manifest in data of various modalities. Accurately diagnosing faults by fully leveraging multiple modalities of data is confronted with two challenges: 1) how to minimize information loss when extracting features for data of each modality; 2) how to correctly capture and utilize the relationships among data of different modalities. To address these challenges, we propose FAMOS, a Fault diagnosis Approach for MicrOservice Systems through effective multi-modal data fusion. On the one hand, FAMOS employs independent feature extractors to preserve the intrinsic features for each modality. On the other hand, FAMOS introduces a new Gaussian-attention mechanism to accurately correlate data of different modalities and then captures the inter-modality relationship with a crossattention mechanism. We evaluated FAMOS on two datasets constructed by injecting comprehensive and abundant faults into an open-source microservice system and a real-world industrial microservice system. Experimental results demonstrate the FAMOS's effectiveness in fault diagnosis, achieving significant improvements in F1 scores compared to state-of-the-art (SOTA) methods, with an increase of 20.33 %.