ASE2021

An Investigation of Compound Variable Names Toward Automated Detection of Confusing Variable Pairs

Hirohisa Aman, Sousuke Amasaki, Tomoyuki Yokogawa, Minoru Kawahara

2 citations

Abstract

A successful naming of variables is key to making the source code readable. Programmers may use a compound variable name by concatenating two or more words to make it easier to understand and more informative. While each compound variable name itself may be easy-to-understand, a collection of such variables sometimes makes a “confusing” variable pair if their names are highly similar, e.g., “shippingHeight,” vs. “shippingWeight.” A confusing variable pair would adversely affect the code readability because it may cause a misreading or a mix-up of variables. Toward automated support for enhancing the code readability, this paper conducts a large-scale investigation of compound variable names in Java programs to find quantitative criteria of the confusing variable pairs. The investigation collects 31,806,749 pairs of compound-named variables from 684 open-source Java projects and analyzes them from two different perspectives of name similarity: the string similarity and the semantic similarity.