ASE2025
Automated Inline Comment Smell Detection and Repair with Large Language Models
Hatice Kübra Çaglar, Semih Çaglar, Eray Tüzün
1 citation
Abstract
Large Language Models (LLMs) have demonstrated impressive capabilities in many tasks such as code generation and automated program repair. However, code LLMs have ignored another important task in programmers' daily development work, which is to improve the maintainability, readability, and scalability of the program. All of these characteristics are related to code smells and we study how to improve them by detecting and removing code smells. Most works on code smells still rely on using measures formulated by experts as features, but lack of use of the rich prior knowledge contained in code LLMs. In this paper, we propose SmellDetector, a comprehensive model for both code smell detection and refactoring opportunities detection in Java. We train the model with the designed prompt which contains both code smells of class-level and method-level in the same code snippet, including more than 20 types. We achieve stateof-the-art performance on the code smell detection task and change the basic paradigm of code smell detection from binary classification problem to multi-label classification. Finally, it has been verified through experiments that good code smell detection helps to detect refactoring opportunities. a software system (Foster et al., 2012) and harm-042 ing its maintainability and evolution (Sjøberg et al., 043 2012). In other words, code smell does not cur-044 rently affect the running of the program and output 045 correct results, but it hinders its further develop-046 ment and iteration. Many researchers have paid 047 attention to the problem of code smells as early 048 as the millennium, and proposed that correct code 049 smell identification can help provide reasonable 050 refactoring locations and opportunities for code 051 refactoring (Fowler and Beck, 1997). The tradi-052 tional method calculates various metrics for the 053 code, such as LCOM (Lack of COhesion in Meth-054 ods) and NMD (number of methods declared), and 055 comprehensively determines whether the code has 056 a certain code smell based on whether it reaches a 057 threshold. When machine learning and deep learn-058 ing algorithms became popular, many researchers 059 input metrics of code smell as features into the 060 model for training to avoid the instability caused 061 by directly selecting thresholds (Jha et al., 2019; 062 Sharma et al., 2021). Besides, in researches of 063 code refactoring, an important research direction 064 is finding refactoring opportunities, which are usu-065 1 ally treated as a binary classification and charac-066 terized by calculating various metrics of program 067 fragments to predict whether a specific refactoring 068 method should be used (Aniche et al., 2020). 069 However, some of the above methods have draw-070 backs: they rely on calculating measures designed 071 by experts as features, which is not in line with 072 the current trend of LLM development. Moreover, 073 code refactoring opportunity detection and code 074 smell detection lack a good connection to make 075 them mutually reinforcing, although they are essen-076 tially information-complementing tasks. 077 In this paper, we present SmellDetector, a com-078 prehensive code smell detection and elimination 079 model, aiming to provide adapters based on LLM 080 for detecting code smells' types and find refactor-081 ing opportunities. 082 We summarize our contributions below: 083 • We propose the first model based on code 084 LLM fine-tuning for code smell detection and 085 refactoring opportunities detection. Our train-086 ing dataset and method is general and can 087 be easily applied to other LLMs with greater 088 capabilities. The model has achieved the state-089 of-arts in code smell detection task. 090 • We collect and organize the first hierarchi-091 cal code smell dataset from previous datasets, 092 which contains multiple code smells in the 093 same code snippet, including 212,612 code 094 smells and 22 types. 095 • We have experimentally proven that effective 096 code smell detection is helpful in detecting 097 code refactoring opportunities, and provides 098 researchers with research ideas that the two 099 tasks should be reasonably combined. 100 2 Related Work 101 2.1 Code Smell Detection 102 Code Smell is considered as inadequate implemen-103 tation and design in code (Fowler and Beck, 1997), 104 bringing various hazards, such as damaging code 105 readability and maintainability. Beck et al. pro-106 vide a detailed definition of 22 code smells through 107 natural language. In order to automate the detec-108 tion of code smell in batches, Moha et al. pro-109 posed a method of calculating program metrics and 110 determining whether they have reached a preset 111 threshold. Additionally, Palomba et al. use history 112 information to detect the code smells and inspire 113 the ideas of many researchers. 114 Recently, Large Language Models (LLMs) have 166 achieved excellent performance in code generation 167 tasks, such as codellama