Жобалар

Kazakh language Emotional Text-to-Speech

This study focuses on the creation of the Kazakh language Emotional Text-to-Speech (KazEmoTTS) dataset, designed for various applications. This model was developed by the Institute of Smart Systems and Artificial Intelligence, Nazarbayev University, Kazakhstan (henceforth ISSAI).

KazEmoTTS is a collection of 54,760 audio-text pairs, with a total duration of 74.85 hours, featuring 34.23 hours delivered by a female narrator and 40.62 hours by two male narrators.

The list of the emotions considered include “neutral”, “angry”, “happy”, “sad”, “scared/fear”, and “surprised”.

We also developed a TTS model trained on the KazEmoTTS dataset. Objective and subjective evaluations were employed to assess the quality of synthesized speech, yielding an MCD score within the range of 6.02 to 7.67, alongside a MOS that spanned from 3.47 to 3.54. We have diversified the topic coverage with a book and Wikipedia articles.

KazEmoTTS dataset can be used to develop Kazakh text-to-speech models for numerous applications, such as interactive smart assistant systems, narration of cartoons and various videos, navigation systems, announcement systems and assistive technologies for the people with special needs. Like all ISSAI’s datasets, KazEmoTTS dataset is freely available to both academic researchers and industry practitioners from ISSAI website.

To demonstrate the utility of the KazEmoTTS dataset, ISSAI has developed a demo program for Kazakh language Emotional Text-to-Speech synthesis.

If you use the ISSAI KazEmoTTS dataset for commercial purposes, please add this statement to your product or service:

Our product uses KazEmoTTS: Kazakh language Emotional Text-to-Speech, which is available under a Creative Commons Attribution 4.0 International License (Creative Commons — Attribution 4.0 International — CC BY 4.0).

If you use the ISSAI ISSAI KazEmoTTS dataset for research, please cite it as:

Adal Abilbekov, Saida Mussakhojayeva, Rustem Yeshpanov, and Huseyin Atakan Varol. KazEmoTTS: Kazakh language Emotional Text-to-Speech. MDPI

To facilitate reproducibility and inspire further research, we have made our code, pre-trained model, and dataset accessible in our GitHub repository.

Demo instructions:

Emotions: [“angry”, “surprise”, “fear”, “happy”, “neutral”, “sad”] (Neutral – by default)

– Insert the Kazakh text inside the box below (Please use only Cyrillic alphabetic characters and punctuation marks, i.e. for better performance, split the long text into shorter segments.)

– Choose a speaker and a desired emotion

– Then click the “GET AUDIO” button

– The page will reload and you will find the Audio of your text under the box. Then press the Play button to listen to the Audio

We ask you to use the demo version of the project and the KazEmoTTS project only for good purposes, not to use it for obscene speech, and also to comply with ethical norms.

50
300