At ISSAI, we have previously developed automatic speech recognition systems for the Kazakh language. Now, leveraging our advances in Kazakh ASR, we have extended our work to a multilingual ASR model that can recognize ten Turkic languages—Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Uyghur, and Uzbek.
The multilingual models that were trained using joint speech data performed more robustly than the baseline monolingual models, with the best model achieving an average character and word error rate reduction of 56% and 54%, respectively.
The results of the experiments demonstrated that character and word error rate reduction was more likely when multilingual models were trained with data from related Turkic languages than when they were developed using data from unrelated, non-Turkic languages, such as English and Russian.
The study also presented an open-source Turkish speech corpus. The corpus contains 218.2 hours of transcribed speech with 186,171 utterances and is the largest publicly available Turkish dataset of its kind. The datasets and codes used to train the models are available for download at https://github.com/IS2AI/TurkicASR.
To demonstrate the utility of multilingual automatic speech recognition model for Turkic languages, ISSAI has developed a demo program that recognizes ten Turkic languages and also Russian and English.
If you use the ISSAI Multilingual Automatic Speech Recognition for Turkic languages for commercial purposes, please add this statement to your product or service:
Our product uses Turkish Speech Corpus ( https://doi.org/10.48342/0xes-sf45), which is available under a Creative Commons Attribution 4.0 International License.
If you use the ISSAI Multilingual Automatic Speech Recognition for Turkic languages for research, please cite it as:
Mussakhojayeva, S.; Dauletbek, K.; Yeshpanov, R.; Varol, H.A. Multilingual Speech Recognition for Turkic Languages. Information 2023, 14, 74. (https://doi.org/10.3390/info14020074)
Demo instructions:
Please click the “RECORD” button and speak immediately until the countdown reaches zero. The recognized output will be displayed above the “RECORD” button after 10 seconds. Please note that some browsers don’t support the audio recording features.