NeurIPS2023

FeCAM: Exploiting the Heterogeneity of Class Distributions in Exemplar-Free Continual Learning

Dipam Goswami, Yuyang Liu, Bartlomiej Twardowski, Joost van de Weijer

105 citations

Abstract

Exemplar-free class-incremental learning (CIL) poses several challenges since it prohibits the rehearsal of data from previous tasks and thus suffers from catastrophic forgetting. Recent approaches to incrementally learning the classifier by freezing the feature extractor after the first task have gained much attention. In this paper, we explore prototypical networks for CIL, which generate new class prototypes using the frozen feature extractor and classify the features based on the Euclidean distance to the prototypes. In an analysis of the feature distributions of classes, we show that classification based on Euclidean metrics is successful for jointly trained features. However, when learning from non-stationary data, we observe that the Euclidean metric is suboptimal and that feature distributions are heterogeneous. To address this challenge, we revisit the anisotropic Mahalanobis distance for CIL. In addition, we empirically show that modeling the feature covariance relations is better than previous attempts at sampling features from normal distributions and training a linear classifier. Unlike existing methods, our approach generalizes to both many-and few-shot CIL settings, as well as to domain-incremental settings. Interestingly, without updating the backbone network, our method obtains state-ofthe-art results on several standard continual learning benchmarks. Code is available at https://github.com/dipamgoswami/FeCAM . This paper investigates methods to enhance the representation of class prototypes in CIL, aiming to improve plasticity within the stability-favoring classifier-incremental setting. A standard practice in few-shot CIL [31, 42, 76, 70, 72] is to obtain the feature embeddings of new class samples and average them to generate class-wise prototypes. The test image features are then classified by computing the Euclidean distance to the mean prototypes. The Euclidean distance is used in the NCM classifier, following [17] , which claims that the highly non-linear nature of learned representations eliminates the need to learn the Mahalanobis [65, 11] metric previously used [38] . Our analysis shows that this holds true for classes that are considered during training, however, for new classes, the Euclidean distance is suboptimal. To address this problem, we propose to use the anisotropic Mahalanobis distance. In Fig. 1 , we explain how the feature representations vary in CIL settings. Here, the highstability case in CIL is explored, where the model does not achieve spherical representations for new classes in the feature space, unlike joint training. Thus, it is intuitive to take into account the feature covariances while computing the distance. The covariance relations between the feature dimensions better captures the more complex class structure in the high-dimensional feature space. Additionally, in Fig. 3 , we analyze singular values for old and new class features to observe the changes in variances in their feature distributions, suggesting a shift towards more anisotropic representations. While previous methods [38] proposed learning Mahalanobis metrics, we propose using an optimal Bayes classifier by modeling the covariance relations of the features and employing class prototypes. We term this approach Feature Covariance-Aware Metric (FeCAM). We compute the covariance matrix for each class from the feature embeddings corresponding to training samples and perform correlation normalization to ensure similar variances across all class representations, which is crucial for distance comparisons. We investigate various ways of using covariance relations in continual settings. We posit that utilizing a Bayes classifier enables better learning of optimal decision boundaries compared to previous attempts [67] involving feature sampling from Gaussian distributions and training linear classifiers. The proposed approach is simple to implement and requires no training since we employ a Bayes classifier. The Bayes classifier FeCAM can be used for both many-shot CIL and few-shot CIL, unlike existing methods. Additionally, we achieve superior performance with pretrained models on both class-incremental and domain-incremental benchmarks. Related Work Many-shot class-incremental learning (MSCIL) is the conventional setting, where sufficient training data is available for all classes. A critical aspect in many-shot CIL methods is the semantic drift in