ISSAI - Institute of Smart Systems and Artificial Intelligence

Söyle: Noise-Robust Multilingual Speech Recognition with Long Transcription Featuring the Tatar Speech Corpus

After focusing on individual languages for a long time, multilingual automatic speech recognition has recently become an active area of research. For instance, Whisper by OpenAI is capable of recognizing speech in 99 languages. However, the performance of Whisper is significantly lower for low-resource languages than for high-resource ones. In this work, we aim to address this and present a fine-tuning strategy for the pre-trained Whisper model so that its performance is improved for a low-resource language family while maintaining performance for a set of high-resource languages.

Specifically, our Söyle model exhibited high performance for both the Turkic language family (11 languages) and the official languages of the United Nations. Our work also presents the first large open-source speech corpus for the Tatar language. We demonstrate that speech recognition performance for Tatar improves with the model trained using the new Tatar Speech Corpus (TatSC). Our model is also trained to be noise-robust and to perform long-form transcription. We open-source our model and TatSC to encourage further research. We envision that our fine-tuning approach will guide the creation multilingual speech recognition models for other low-resource language families.

If you use the ISSAI Söyle: Noise-Robust Multilingual Speech Recognition with Long Transcription Featuring the Tatar Speech Corpus for commercial purposes, please add this statement to your product or service:

Our product uses Söyle: Noise-Robust Multilingual Speech Recognition with Long Transcription Featuring the Tatar Speech Corpus (doi: 10.48342/hkc6-yq77), which is available under a Creative Commons Attribution 4.0 International License (Creative Commons — Attribution 4.0 International — CC BY 4.0).

If you use the ISSAI Söyle: Noise-Robust Multilingual Speech Recognition with Long Transcription Featuring the Tatar Speech Corpus for research, please cite it as:

Saida Mussakhojayeva, Rinat Gilmullin, Daniil Orel, Bulat Khakimov, Adal Abilbekov, Mansur Galimov and Huseyin Atakan Varol. Söyle: Noise-Robust Multilingual Speech Recognition with Long Transcription Featuring the Tatar Speech Corpus. MDPI

Скачать данные Скачать код

This work is licensed under a Creative Commons Attribution 4.0 International license.

Проекты

Söyle: Noise-Robust Multilingual Speech Recognition with Long Transcription Featuring the Tatar Speech Corpus