KDD2025

Dynamic Deep Clustering of High-Dimensional Directional Data via Hyperspherical Embeddings with Bayesian Nonparametric Mixtures

Zhiwen Luo, Wentao Fan, Manar Amayri, Nizar Bouguila

4 citations

Abstract

Clustering high-dimensional directional data (i.e., L2 normalized vectors) presents significant challenges due to the intricate spherical representations of latent embeddings and the limitations of classical (non-deep) clustering techniques. Moreover, dynamically inferring the number of clusters remains a fundamental issue in existing deep clustering methods, especially those involving complex model-selection criteria. This paper addresses these challenges by introducing a novel deep nonparametric clustering framework that employs hyperspherical latent embeddings within a Variational Autoencoder architecture, enhanced by an infinite Von Mises-Fisher Mixture Model as a dynamic prior. This approach enables automatic adaptation of cluster numbers during training, eliminating the need for predefined clusters and traditional model selection processes. Our scalable architecture effectively integrates In-vMFMM with hyperspherical embeddings to tackle the complexities of directional data. Utilizing a joint training strategy, our method alternates between updating neural network parameters and adjusting mixture model priors via nonparametric variational Bayes. Empirical evaluations on benchmark datasets, including complex ImageNet-50, demonstrate that our approach significantly outperforms state-of-the-art deep nonparametric clustering methods. It also robustly estimates the number of clusters, showcasing its effectiveness and versatility in handling high-dimensional directional data.