FSE2025

TracePicker: Optimization-Based Trace Sampling for Microservice-Based Systems

Shuaiyu Xie, Jian Wang, Maodong Li, Peiran Chen, Jifeng Xuan, Bing Li

1 citation

Abstract

Distributed tracing is a pivotal technique for software operators to understand and diagnose issues within microservice-based systems, offering a comprehensive view of user requests propagated through various services. However, the unprecedented volume of traces imposes expensive storage and analytical burdens on online systems. Conventional tracing approaches typically rely on random sampling with a fixed probability for each trace, which risks missing valuable traces. Several tail-based sampling methods have thus been proposed to sample traces based on their content. Nevertheless, these methods primarily evaluate traces on an individual basis, neglecting the collective attributes of the sample set in terms of comprehensiveness, balance, and consistency. To address these issues, we propose TracePicker, an optimization-based online sampler designed to enhance the quality of sampled data while mitigating storage burden. TracePicker employs a streaming anomaly detector to capture and retain anomalous traces that are crucial for troubleshooting. For normal traces, the sampling process is segmented into quota allocation and group sampling, both formulated as integer programming problems. By solving these problems using dynamic programming and evolution algorithms, TracePicker selects a high-quality subset of data, minimizing overall information loss. Experimental results demonstrate that TracePicker outperforms existing tail-based sampling methods in terms of both sampling quality and time consumption.