NeurIPS2022

On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs

Arjun Subramonian, Kai-Wei Chang, Yizhou Sun

被引用 9 次

摘要

In human networks, nodes belonging to a marginalized group often have a disproportionate rate of unknown or missing features. This, in conjunction with graph structure and known feature biases, can cause graph feature imputation algorithms to predict values for unknown features that make the marginalized group's feature values more distinct from the the dominant group's feature values than they are in reality. We call this distinction the discrimination risk. We prove that a higher discrimination risk can amplify the unfairness of a machine learning model applied to the imputed data. We then formalize a general graph feature imputation framework called mean aggregation imputation and theoretically and empirically characterize graphs in which applying this framework can yield feature values with a high discrimination risk. We propose a simple algorithm to ensure mean aggregation-imputed features provably have a low discrimination risk, while minimally sacrificing reconstruction error (with respect to the imputation objective). We evaluate the fairness and accuracy of our solution on synthetic and real-world credit networks. Related work Feature imputation Feature imputation algorithms leverage known feature values to predict unknown feature values (and sometimes update known feature values). For example, unknown feature values may be filled as the mean of known values [16] . However, more intricate feature imputation methods have been proposed in the ML, statistics, and epidemiology literature, with popular approaches including matrix completion [32, 33, 34, 35] , nearest neighbors [36, 37] , multiple imputation via conditional models [38, 39, 40] , and causal inference [41, 42] . Notably, while feature imputation may be applied to data with unknown feature values prior to the data being passed to a ML model,