ASE2024

The Importance of Accounting for Execution Failures when Predicting Test Flakiness

Guillaume Haben, Sarra Habchi, John Micco, Mark Harman, Mike Papadakis, Maxime Cordy, Yves Le Traon

被引用 2 次

摘要

Flaky tests are tests that pass and fail on different executions of the same version of a program under test. They waste valuable developer time by making developers investigate false alerts (flaky test failures). To deal with this issue, many prediction methods have been proposed. However, the utility of these methods remains unclear since they are typically evaluated based on single-release data, ignoring that in many cases tests that fail flakily in one release also correctly fail (indicating the presence of bugs) in some other, meaning that it is possible for subsequent correctly-failing cases to pass unnoticed. In this paper, we show that this situation is prevalent and can raise significant concerns for both researchers and practitioners. In particular, we show that flaky tests, tests that exhibit flaky behaviour at some point in time, have a strong faultrevealing capability, i.e., they reveal more than 1 /3 of all encountered regression faults. We also show that 76.2%, of all test executions that reveal faults in the codebase under test are made by tests that are classified as flaky by existing prediction methods. Overall, our findings motivate the need for future research to focus on predicting flaky test executions instead of flaky tests. CCS CONCEPTS • Software and its engineering → Software testing and debugging.