Publication

Noise-Robust Multilingual Speech Recognition and the Tatar Speech Corpus

After focusing on individual languages for a long time, multilingual automatic speech recognition has recently become an active area of research. For instance, Whisper by OpenAI is capable of recognizing speech in 99 languages. However, the performance of Whisper is significantly lower for low-resource languages than for high-resource ones. In this work, we aim to address this and present a fine-tuning strategy for the pre-trained Whisper model so that its performance is improved for a low – resource language family while maintaining performance for a set of high-resource languages. Specifically, our Soyle model exhibited high performance for both the Turkic language family (11 languages) and the official languages of the United Nations. Our work also presents the first large open-source speech corpus for the Tatar language. We demonstrate that speech recognition performance for Tatar improves with the model trained using the new Tatar Speech Corpus (TatSC). Our model is also trained to be noise-robust. We open-source our model and TatSC to encourage further research. We envision that our fine-tuning approach will guide the creation multilingual speech recognition models for other low-resource language families.

Information about the publication

Authors:

Saida Mussakhojayeva, Rinat Gilmullin, Bulat Khakimov, Mansur Galimov, Daniil Orel, Adal Adilbekov, Huseyin Atakan Varol
PDF