ISSAI - Institute of Smart Systems and Artificial Intelligence

Kazakh language Emotional Text-to-Speech

This study focuses on the creation of the Kazakh language Emotional Text-to-Speech (KazEmoTTS) dataset, designed for various applications. This model was developed by the Institute of Smart Systems and Artificial Intelligence, Nazarbayev University, Kazakhstan (henceforth ISSAI).

KazEmoTTS is a collection of 54,760 audio-text pairs, with a total duration of 74.85 hours, featuring 34.23 hours delivered by a female narrator and 40.62 hours by two male narrators.

The list of the emotions considered include “neutral”, “angry”, “happy”, “sad”, “scared/fear”, and “surprised”.

We also developed a TTS model trained on the KazEmoTTS dataset. Objective and subjective evaluations were employed to assess the quality of synthesized speech, yielding an MCD score within the range of 6.02 to 7.67, alongside a MOS that spanned from 3.47 to 3.54. We have diversified the topic coverage with a book and Wikipedia articles.

KazEmoTTS dataset can be used to develop Kazakh text-to-speech models for numerous applications, such as interactive smart assistant systems, narration of cartoons and various videos, navigation systems, announcement systems and assistive technologies for the people with special needs. Like all ISSAI’s datasets, KazEmoTTS dataset is freely available to both academic researchers and industry practitioners from ISSAI website.

To demonstrate the utility of the KazEmoTTS dataset, ISSAI has developed a demo program for Kazakh language Emotional Text-to-Speech synthesis.

If you use the ISSAI KazEmoTTS dataset for commercial purposes, please add this statement to your product or service:

Our product uses KazEmoTTS: Kazakh language Emotional Text-to-Speech, which is available under a Creative Commons Attribution 4.0 International License (Creative Commons — Attribution 4.0 International — CC BY 4.0).

If you use the ISSAI ISSAI KazEmoTTS dataset for research, please cite it as:

Adal Abilbekov, Saida Mussakhojayeva, Rustem Yeshpanov, and Huseyin Atakan Varol. KazEmoTTS: Kazakh language Emotional Text-to-Speech. MDPI

To facilitate reproducibility and inspire further research, we have made our code, pre-trained model, and dataset accessible in our GitHub repository.

Скачать данные Скачать код Обученные модели

This work is licensed under a Creative Commons Attribution 4.0 International license.

Проекты

Kazakh language Emotional Text-to-Speech