ACL2021

Decoupling Adversarial Training for Fair NLP

Xudong Han, Timothy Baldwin, Trevor Cohn

Abstract

Adversarial debiasing can help to learn fairer models. Previous work has assumed that both main task labels and protected attributes are available in the dataset. However, protected labels are often unavailable, or only available in limited numbers. In this paper, we propose a training strategy which needs only a small volume of protected labels in adversarial training, incorporating an estimation method to transfer private-labelled instances from one dataset to another. We demonstrate the in-and crossdomain effectiveness of our method through a range of experiments.