ICLR2023

Learning without Prejudices: Continual Unbiased Learning via Benign and Malignant Forgetting

Myeongho Jeon, Hyoje Lee, Yedarm Seong, Myungjoo Kang

摘要

In the recent years, lifelong learning (LL) has attracted a great deal of attention in the deep learning community, where it is often called continual learning. Though it is well-known that deep neural networks (DNNs) have achieved state-of-the-art performances in many machine learning (ML) tasks, the standard multi-layer perceptron (MLP) architecture and DNNs suffer from catastrophic forgetting [McCloskey and Cohen, 1989] which makes it difficult for continual learning. The problem is that when a neural network is used to learn a sequence of tasks, the learning of the later tasks may degrade the performance of the models learned for the earlier tasks. Our human brains, however, seem to have this remarkable ability to learn a large number of different tasks without any of them negatively interfering with each other. Continual learning algorithms try to achieve this same ability for the neural networks and to solve the catastrophic forgetting problem. Thus, in essence, continual learning performs incremental learning of new tasks. Unlike many other LL techniques, the emphasis of current continual learning algorithms has not been on how to leverage the knowledge learned in previous tasks to help learn the new task better. In this chapter, we first give an overview of catastrophic forgetting (Section 4.1) and survey the proposed continual learning techniques that address the problem (Section 4.2). We then introduce several recent continual learning methods in more detail . Two evaluation papers are also covered in Section 4.9 to evaluate the performances of some existing continual learning algorithms. Last but not least, we give a summary of the chapter and list the relevant evaluation datasets. CATASTROPHIC FORGETTING Catastrophic forgetting or catastrophic interference was first recognized by McCloskey and Cohen [1989]. They found that, when training on new tasks or categories, a neural network tends to forget the information learned in the previous trained tasks. This usually means a new task will likely override the weights that have been learned in the past, and thus degrade the model performance for the past tasks. Without fixing this problem, a single neural network will not be able to adapt itself to an LL scenario, because it forgets the existing information/knowledge when it learns new things. This was also referred to as the stability-plasticity dilemma in Abraham and Robins [2005]. On the one hand, if a model is too stable, it will not be able to consume new information from the future training data. On the other hand, a model with sufficient plasticity