ICML2020

Optimizing Data Usage via Differentiable Rewards

Xinyi Wang, Hieu Pham, Paul Michel, Antonios Anastasopoulos, Jaime G. Carbonell, Graham Neubig

73 citations

Abstract

To acquire a new skill, humans learn better and faster if a tutor informs them of how much attention they should pay to particular content or practice problems based on their current knowledge level. Similarly, a machine learning model could potentially be trained better if data is presented in a way that adapts to its current learning state. In this paper, we examine the problem of training an adaptive scorer that weights data instances to maximally benefit learning. Training such as scorer efficiently is a challenging problem; in order to precisely quantify the effect of a data instance on the final model, a naive approach would require completing the entire training process and observing final performance. We propose an efficient alternative -Differentiable Data Selection (DDS) -that formulates a scorer as a learnable function of the training data that can be efficiently updated along with the main model being trained. Specifically, DDS updates the scorer with an intuitive reward signal: it should up-weigh the data that has a similar gradient with a development set upon which we would finally like to perform well. Without significant computing overhead, DDS delivers consistent improvements over several strong baselines on two very different tasks of machine translation and image classification.