ICML2022

On the Effects of Artificial Data Modification

Antonia Marcu, Adam Prügel-Bennett

被引用 3 次

摘要

Data distortion is commonly applied in vision models during both training (e.g methods like MixUp and CutMix) and evaluation (e.g. shapetexture bias and robustness). This data modification can introduce artificial information. It is often assumed that the resulting artefacts are detrimental to training, whilst being negligible when analysing models. We investigate these assumptions and conclude that in some cases they are unfounded and lead to incorrect results. Specifically, we show current shape bias identification methods and occlusion robustness measures are biased and propose a fairer alternative for the latter. Subsequently, through a series of experiments we seek to correct and strengthen the community's perception of how augmenting affects learning of vision models. Based on our empirical results we argue that the impact of the artefacts must be understood and exploited rather than eliminated. Motivation Augmentation is commonplace when training models. It is a form of data modification where samples are artificially distorted to create larger training sets. Apart from augmentative purposes, data modification is also used for a wide range of model analysis methods. Most recently, distortionbased approaches have been adopted when trying to answer key machine learning questions. To this end, MixUp-like distortions (Zhang et al., 2018) were proposed for empirically predicting generalisation (Schiff et al., 2021; Natekar & Sharma, 2020; Lassance et al., 2020) . Thus, data modification is becoming increasingly popular, but little attention is paid to the secondary effects of this practice. As we will demonstrate, our current understanding of the effects of data modification lies on fundamentally flawed assumptions.