6th August 2021

ISSAI team has developed the Uzbek Speech Corpus

In collaboration with the Image and Speech Processing Laboratory in the Department of Computer Systems of the Tashkent University of Information Technologies ( ISSAI has developed the Uzbek Speech Corpus (USC). 

The USC comprises 958 different speakers with a total of 105 hours of transcribed audio recordings. To ensure high quality, the USC has been manually checked by native speakers of Uzbek language. The USC is primarily designed for automatic Uzbek speech recognition (ASR), which is a system that recognizes Uzbek speech and shows the speech text on the screen. However, USC can also be used to aid other speech-related tasks, such as speech synthesis and speech translation, and other voice-controlled smart devices. Additionally, this work should facilitate the development of assistive technologies in the Uzbek language for people with special needs.

USC is an open-source Uzbek speech corpus available for both academic and commercial use under the Creative Commons Attribution 4.0 International License. We expect that the USC will be a valuable resource for the general speech research community and become the baseline dataset for Uzbek ASR research.

You can try the demo of the automatic Uzbek speech recognition system, built using Uzbek Speech Corpus, here: