EMNLP2024

On Mitigating Performance Disparities in Multilingual Speech Recognition

Monorama Swain, Anna Zee, Anders Søgaard

摘要

How far have we come in mitigating performance disparities across genders in multilingual speech recognition? We compare the impact on gender disparity of different finetuning algorithms for automated speech recognition across model sizes, languages and gender. We look at both performance-focused and fairness-promoting algorithms. Across languages, we see slightly better performance for female speakers for larger models regardless of the fine-tuning algorithm. The best tradeoff between performance and parity is found using adapter fusion. Fairness-promoting finetuning algorithms (Group-DRO and Spectral Decoupling) hurt performance compared to adapter fusion with only slightly better performance parity. LoRA increases disparities slightly. Fairness-mitigating fine-tuning techniques led to slightly higher variance in performance across languages, with the exception of adapter fusion.