ISSTA2024

Leveraging Natural Language Processing and Data Mining to Augment and Validate APIs

Alix Decrop

Abstract

APIs are increasingly prominent for modern web applications, allowing millions of users around the world to access data. Reducing the risk of API defects -and consequently failures -is key, notably for security, availability, and maintainability purposes. Documenting an API is crucial, allowing the user to better understand it. Moreover, API testing techniques often require formal documentation as input. However, documenting is a time-consuming and error-prone task, often overlooked by developers. Natural Language Processing (NLP) could assist API development, as recent Large Language Models (LLMs) demonstrated exceptional abilities to automate tasks based on their colossal training data. Data mining could also be utilized, synthesizing API information scattered across the web. Hence, I present my PhD project aimed at exploring the usage of NLP-related technologies and data mining to augment and validate APIs. The research questions of this PhD project are: (1) What types of APIs can benefit from NLP and data mining assistance? (2) What API problems can be solved with such methods? (3) How effective are the methods (i.e. LLMs) in assisting APIs? (4) How efficient are the methods in assisting APIs (i.e. time and costs)? CCS Concepts • Computing methodologies → Natural language processing; • Software and its engineering → Software testing and debugging.