ASE2022

'Who built this crap?' Developing a Software Engineering Domain Specific Toxicity Detector

Jaydeb Sarker

5 citations

Abstract

Since toxicity during developers’ interactions in open source software (OSS) projects show negative impacts on developers’ relation, a toxicity detector for the Software Engineering (SE) domain is needed. However, prior studies found that contemporary toxicity detection tools performed poorly with the SE texts. To address this challenge, I have developed ToxiCR, a SE-specific toxicity detector that is evaluated with manually labeled 19,571 code review comments. I evaluate ToxiCR with different combinations of ten supervised learning models, five text vectorizers, and eight preprocessing techniques (two of them are SE domain-specific). After applying all possible combinations, I have found that ToxiCR significantly outperformed existing toxicity classifiers with accuracy of 95.8% and an F1 score of 88.9%.