WWW2026

TraceLLM: Evaluating and Exploring Large Language Models on Trace Analysis in Microservice-based Web Applications

Tong Zhou, Xin Peng, Jie Zhang, Chaofeng Sha, Chenxi Zhang, Zicheng Yuan, Senyu Xie

摘要

Trace analysis is essential for understanding system behaviors, detecting anomalies, and diagnosing faults in complex microservice-based web applications. Existing trace analysis approaches face several challenges in industrial microservice-based systems, including high manual overhead, limited functionality, unfriendly interaction mechanisms, and difficulties in deployment and integration. The strong capabilities of large language models (LLMs) in natural language understanding, reasoning, and multi-task generalization provide new opportunities for a more intelligent and flexible trace analysis approach. However, the trace analysis capabilities of LLMs remain underexplored and underdeveloped. To bridge this gap, we conduct the first comprehensive evaluation on the trace analysis capabilities of LLMs. In particular, we construct the first instruction&response benchmark dataset for trace analysis, named TraceBench. It involves a wide range of trace analysis tasks, allowing us to systematically evaluate the capabilities of LLMs in this area. Experimental results show that LLMs have potential in handling trace analysis tasks, but there leaves room for improvement. To this end, we propose TraceLLM, an approach that significantly enhances the capabilities of LLMs via fine-tuning, outperforming the open-source LLMs by 34.77% on average in terms of accuracy, and outperforming the closed-source model by 21.66% in the best case. The generalization and robustness of TraceLLM are also confirmed in our experiments. To the best of our knowledge, TraceLLM is the first LLM which is specialized for handling various types of trace analysis tasks. This work provides a foundation for future research to further explore the trace analysis capabilities of LLMs.