ICSE2025

Ranking Relevant Tests for Order-Dependent Flaky Tests

Shanto Rahman, Bala Naren Chanumolu, Suzzana Rafi, August Shi, Wing Lam

1 citation

Abstract

One major challenge of regression testing are flaky tests, i.e., tests that may pass in one run but fail in another run for the same version of code. One prominent category of flaky tests is order-dependent (OD) flaky tests, which can pass or fail depending on the order in which the tests are run. To help developers debug and fix OD tests, prior work attempts to automatically find OD-relevant tests, which are tests that determine whether an OD test passes or fails, depending on whether the OD-relevant tests run before or after the OD test. Prior work found OD-relevant tests by running different tests before the OD test, without considering each test's likelihood of being OD-relevant tests. We propose RankF to rank tests in order of likelihood of being OD-relevant tests, finding the first OD-relevant test for a given OD test more quickly. We propose two ranking approaches, each requiring different information. Our first approach, <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> $\boldsymbol{RankF}_L$ </tex>, relies on training a large-language model to analyze test code. Our second approach, <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> $\boldsymbol{RankF}_O$ </tex>, relies on analyzing prior test-order execution information. We evaluate our approaches on 155 OD tests across 24 open-source projects. We compare RankF against baselines from prior work, where we find that RankF finds the first OD-relevant test for an OD test faster than the best baseline; depending on the type of OD-relevant test, RankF takes 9.4 to 14.1 seconds on median, compared to the baseline's 34.2 to 118.5 seconds on median.