ASE2025
Finding Keywords for Architectural Erosion Detection in GitHub Commits for Android Applications
Juan Camilo Acosta-Rojas, Camilo Andrés Escobar-Velasquez
摘要
Empirical studies on the engineering of Android apps need to be based on open datasets and tools to allow comparisons, improve generalizability, and enable replicability. However, obtaining a good dataset is problematic and this state of things slows down empirical research on this topic. In this paper, we contribute to overcome this challenge by presenting the first, self-contained, publicly available dataset weaving spread-out data sources about real-world, open-source Android apps. Our dataset is encoded as a graph-based database and contains the following information about 8,431 real open-source Android apps: (i) metadata about their GitHub projects, (ii) Git repositories with full commit history and (iii) metadata extracted from the Google Play store, such as app ratings and permissions. The dataset is available in Docker images to ease adoption.