NeurIPS2022
Semi-Supervised Generative Models for Multiagent Trajectories
Dennis Fassmeyer, Pascal Fassmeyer, Ulf Brefeld
6 citations
Abstract
Recent work in pathological speech classification has employed supervised learning algorithms such as neural networks and support vector machines to classify speech as healthy or pathological. A challenge in applying such machine learning techniques to pathological speech classification is the labelled data shortage problem. While labelled data are expensive and scarce, unlabelled data are inexpensive and plentiful. Labelled data acquisition often entails significant human effort and time-consuming experimental design. Further, for medical applications, privacy and ethical issues must be addressed where patient data is collected. In this thesis, we investigate a semi-supervised learning (SSL) approach that employs a generative model to incorporate both labelled and unlabelled data into the training process. Generative models explored include both a generative adversarial network (GAN) and a variational autoencoder (VAE). To employ a GAN, we modify its traditional discriminator to not only differentiate between real and fake speech samples but to also classify the given sample as healthy or pathological. To employ a VAE, we first pre-train the VAE with unlabelled data and subsequently, incorporate the pre-trained encoder into a classifier to be trained on labelled data. We test our approach using three commonly used pathological speech datasets: the Spanish Parkinson's Diseases Dataset (SPDD), the Saarbrucken Voice Database (SVD) and the Arabic Voice Pathology Database (AVPD). We compare the performance of the GAN and VAE-based approaches trained on both labelled and unlabelled data with a traditional supervised approach based on a convolutional neural network (CNN) trained only on labelled data. We observe that our SSL-based approach leads to an accuracy gain compared to a baseline CNN trained only on labelled pathological speech data. This promising result shows that our approach has the potential to alleviate the labelled data shortage problem in pathological speech classification and other medical applications where labelled data acquisition is challenging. x Artificial Intelligence is the new electricity.