KDD2022
Online Clustering: Algorithms, Evaluation, Metrics, Applications and Benchmarking
Jacob Montiel, Hoang-Anh Ngo, Minh-Huong Le Nguyen, Albert Bifet
被引用 4 次
摘要
Online clustering algorithms play a critical role in data science, especially with the advantages regarding time, memory usage and complexity, while maintaining a high performance compared to traditional clustering methods. This tutorial serves, first, as a survey on online machine learning and, in particular, data stream clustering methods. During this tutorial, state-of-the-art algorithms and the associated core research threads will be presented by identifying different categories based on distance, density grids and hidden statistical models. Clustering validity indices, an important part of the clustering process which are usually neglected or replaced with classification metrics, resulting in misleading interpretation of final results, will also be deeply investigated.