NeurIPS2022

Out-of-Distribution Detection via Conditional Kernel Independence Model

Yu Wang, Jingjing Zou, Jingyang Lin, Qing Ling, Yingwei Pan, Ting Yao, Tao Mei

9 citations

Abstract

Recently, various methods have been introduced to address the OOD detection problem with training outlier exposure. These methods usually count on discriminative softmax metric or energy method to screen OOD samples. In this paper, we probe an alternative hypothesis on OOD detection by constructing a novel latent variable model based on independent component analysis (ICA) techniques. This novel method named Conditional-i builds upon the probabilistic formulation, and applies the Hilbert-Schmidt Independence Criteria that offers a convenient solution for optimizing variable dependencies. Conditional-i exclusively encodes the useful class condition into the probabilistic model, which provides the desired convenience in delivering theoretical support for the OOD detection task. To facilitate the implementation of the Conditional-i model, we construct unique memory bank architectures that allow for convenient end-to-end training within a tractable budget. Empirical results demonstrate an evident performance boost on benchmarks against SOTA methods. We also provide valuable theoretical justifications that our training strategy is guaranteed to bound the error in the context of OOD detection. Code is available at: https://github.com/OODHSIC/conditional-i . * Yu Wang and Jingjing Zou contributed equally to this work (joint first author). 36th Conference on Neural Information Processing Systems (NeurIPS 2022). Unlike previous efforts that rely on canonical discrimination or generative models, we view the OOD detection problem from a new perspective of the independent component analysis (ICA). Our starting point is the hypothesis that OOD data should exhibit a slight dependency on in-distribution training data, which can effectively constrain the OOD data to expose limited mutual information with the inliers. This hypothesis imposes the inliers to extract only little predictive information from outliers during training, and we distinguish OOD samples via the related independence measurements during the test. The contemporary work [39] attempted to probe the OOD problem from a similar view. However, [39] merely demonstrates the promising practical values of such independence assumption based on empirical success, which lacks theoretical justification and supporting analysis. In this work, we propose a brand new motivating generative model, by incorporating additional class conditions into the latent variable model. This new work also aims to showcase the theoretical soundness through the transparent lens of ICA techniques, which provides desired technical convenience. Our contribution to this paper includes: C1: We propose a new OOD detection framework called Conditional-i model (reminiscent of Conditional-independence model). In comparison to [39], Conditional-i additionally encodes the class condition into the probabilistic dependence model. C2: In order to exclusively facilitate the end-to-end training of the Conditional-i model, we construct an efficient memory bank architecture that constrains the training within a practical computational budget. C3: Conditional-i can be efficiently implemented both when training OOD data is available or not. C4: We provide theoretical justifications for our new model. Empirical results demonstrate the evident superiority of our new proposal over the state-of-the-art methods for computer vision and NLP tasks. Related Work Challenges for OOD detection. In [12, 28, 50, 67] , empirical evidence shows that deep generative models trained on image datasets can often erroneously assign a high likelihood to OOD inputs. In the meanwhile, as observed in [23, 33] , discriminative models also encounter a similar issue in returning high confidence on OOD samples or misclassified samples. Many contemporary works attempted to understand the principle behind this phenomenon. The work in [52] figured that the shared background statistics between in-distribution data and OOD data could interfere with the test. In [50, 67] , they discussed the impact of typicality on the the test. In [78], the work rather challenged the prevailing typical set hypothesis and the assumption that in-and out-distribution overlap. In [76] , authors argued both accurate density estimation and discriminative classifier are critical, and henceforth proposed a hybrid model to address the issue. Research works also aim to alleviate the issue from the view of miscalibration [16, 19, 23, 44] . The work in [18] further demonstrated that deep architecture on the OOD detection performance task could have a significant impact.