ACL2021
Decoupling Adversarial Training for Fair NLP
Xudong Han, Timothy Baldwin, Trevor Cohn
Abstract
Adversarial debiasing can help to learn fairer models. Previous work has assumed that both main task labels and protected attributes are available in the dataset. However, protected labels are often unavailable, or only available in limited numbers. In this paper, we propose a training strategy which needs only a small volume of protected labels in adversarial training, incorporating an estimation method to transfer private-labelled instances from one dataset to another. We demonstrate the in-and crossdomain effectiveness of our method through a range of experiments.