ICSE2025

A Tale of Two DL Cities: When Library Tests Meet Compiler

Qingchao Shen, Yongqiang Tian, Haoyang Ma, Junjie Chen, Lili Huang, Ruifeng Fu, Shing-Chi Cheung, Zan Wang

4 citations

Abstract

Deep Learning (DL) compilers typically load a DL model and optimize it with intermediate representation. Existing DL compiler testing techniques mainly focus on model optimization stages, but rarely explore bug detection at the model loading stage. Effectively testing the model loading stage requires covering diverse usages of each DL operator from various DL libraries, which shares a common objective with DL library testing, indicating that the embedded knowledge in DL library tests is beneficial for testing the model loading stage of DL compilers. With this idea, we propose Opera to migrate the knowledge embedded in DL library tests to test the model loading stage. Opera constructs diverse tests from various tests for DL libraries (including the tests documented in DL libraries and those generated by recent fuzzers). In total, we considered three sources of tests in DL libraries for migration. In addition, it incorporates a diversity-based test prioritization strategy to migrate and execute those tests that are more likely to detect diverse bugs earlier. We then used eight frontends from three DL compilers (e.g., TVM, TensorRT, and OpenVINO) for evaluation. OPERA detected 170 previously unknown bugs in total, 90 of which have been confirmed/fixed by developers, demonstrating the effectiveness of such the migration-based idea. The test prioritization strategy in OPERA improves testing efficiency with migrated tests by <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> $11.9 \% \sim 47.4 \%$ </tex> on average compared to general test prioritization strategies.