NeurIPS2022

Addressing Resource Scarcity across Sign Languages with Multilingual Pretraining and Unified-Vocabulary Datasets

Gokul NC, Manideep Ladi, Sumit Negi, Prem Selvaraj, Pratyush Kumar, Mitesh M. Khapra

被引用 17 次

摘要

There are over 300 sign languages in the world, many of which have very limited or no labelled sign-to-text datasets. To address low-resource data scenarios, self-supervised pretraining and multilingual finetuning have been shown to be effective in natural language and speech processing. In this work, we apply these ideas to sign language recognition. We make three contributions. First, we release SignCorpus , a large pretraining dataset on sign languages comprising about 4.6K hours of signing data across 10 sign languages. SignCorpus is curated from sign language videos on the internet, filtered for data quality, and converted into sequences of pose keypoints thereby removing all personal identifiable information (PII). Second, we release Sign2Vec , a graph-based model with 5.2M parameters that is pretrained on SignCorpus . We envisage Sign2Vec as a multilingual large-scale pretrained model which can be fine-tuned for various sign recognition tasks across languages. Third, we create MultiSign-ISLR – a multilingual and label-aligned dataset of sequences of pose keypoints from 11 labelled datasets across 7 sign languages, and MultiSign-FS – a new finger-spelling training and test set across 7 languages. On these datasets, we fine-tune Sign2Vec to create multilingual isolated sign recognition models. With experiments on multiple benchmarks, we show that pretraining and multilingual transfer are effective giving significant gains over state-of-the-art results. both pretraining and finetuning stages to be monolingual or multilingual. We see significant improvements against each of these baselines and also against state-of-the-art models trained individually for each dataset. We also finetune Sign2Vec on MultiSign-FS to create a multilingual finger spelling recognition dataset. With similar comparisons with baselines we report large improvements in accuracy both due to multilingual pretraining and joint fine-tuning. With these results, we demonstrate the value of the datasets we release - SignCorpus , MultiSign-ISLR , and MultiSign-FS and effectiveness of the multilingual Sign2Vec model. All models are released as part of the OpenHands repository.