ISSAI - Institute of Smart Systems and Artificial Intelligence

Multilingual Speech Command Recognition with Language Identification

Multilingual Speech Command Recognition (SCR) facilitates voice interaction in environments where multiple languages are used interchangeably, a common characteristic of multilingual regions. In such settings, SCR and language identification (LID) are handled by separate models. This separation increases inference time, energy consumption, and memory usage. To address this gap, we propose a unified multitask model that performs SCR and LID simultaneously using a shared encoder and two task-specific output heads. We tested our approach using 15 languages: English, Kazakh, Tatar, Russian, Arabic, Turkish, French, German, Catalan, Spanish, Polish, Dutch, Persian, Kinyarwanda, and Italian. We trained and compared a monolingual SCR model for each language, a multilingual SCR model without LID, a multitask multilingual SCR model with LID, and an LID-only model. The multitask model achieves an average accuracy of 90.73% for SCR and 90.99% for LID, outperforming both the multilingual SCR model without LID and the LID-only model. We have made the source code and pretrained models available at https://github.com/IS2AI/Keyword-MLP-LangID to promote research in this area.

Publication

Multilingual Speech Command Recognition with Language Identification

Information about the publication

Authors:

Other publications