ASE2025

Efficient Understanding of Machine Learning Model Mispredictions

Martin Eberlein, Jürgen Cito, Lars Grunske

摘要

Mispredictions by machine learning components can have severe consequences, especially in safety-critical and mission-critical software systems. Therefore, understanding and debugging these mispredictions is a crucial part of the development process for systems that use machine learning components. Previous research has successfully applied methods that identify when a model's predictions may be unreliable by generating a rule set that links feature values to prediction errors. However, current state-of-the-art rule set approaches require significant computational resources, particularly for large data sets. To address these high computational demands, we propose a strategy to identify and focus only on the most influential features that lead to mispredictions. Additionally, to improve the accuracy of mispredictions diagnosis, we replace traditional rule-based approaches with decision tree learning. We evaluate our tool MMDFAST across 11 diverse real-world data sets. The results show that focusing on influential features with decision trees improves the accuracy of misprediction explanations, while significantly reducing computational demands in all scenarios. Thus, MMDFAST produces better results much faster, making it more efficient and effective for generating misprediction explanations.