ASE2025
Automated Evolutionary Hyperparameter Tuning for NLP-Based Test Case Generation
Ivan P. Malashin, Igor S. Masich, Sergei Kurashkin, Andrei P. Gantimurov, Aleksey S. Borodulin, Vladimir A. Neluyb, Vadim Tynchenko
摘要
Automated generation of executable test suites from natural-language requirements remains challenging due to linguistic ambiguity and sensitivity of generative models to decoding and training hyperparameters. This paper introduces a hierarchical, multi-level evolutionary framework that treats model hyperparameters and decoding strategies as upper-level decision variables and employs lower-level fitnesses that directly measure test-quality objectives (structural coverage, semantic diversity, redundancy, and runtime efficiency). The approach integrates retrieval-augmented grounding, surrogate-assisted preselection, lightweight LoRA adaptation and optional HIL evaluation. Empirical evaluation on PURE, PROMISE_exp and FR_NFR benchmarks (repeated runs, n = 10; paired twosided t-tests, ) shows consistent gains: on PURE mean code coverage reaches (vs. for Bayesian optimisation and 68.9% for random search) with 145 unique scenarios and modest runtime overhead , above Bayesian). Ablations confirm component effects (e.g., removing diversity reduces unique scenarios ; disabling the surrogate increases wall-clock ; disabling RAG drops grounded consistency ). Results indicate that co-optimising hyperparameters for explicit test-quality metrics, together with grounding and realistic execution, yields more useful, executable test suites. Future work will explore adaptive objective weighting, transfer warm-starts and probabilistic surrogates.