Welcome to ISSAI’s Tilmash project, enabling two-way machine translation for six languages — Kazakh, Russian, English, Turkish, Tatar, and Uzbek.
Our translation model was fine-tuned using Facebook’s NLLB model, designed to handle translation challenges across 202 languages.
Our model was trained using an array of data sources, including official government websites (e.g., the official site of the President of the Republic of Kazakhstan and the State of the Nation Address), news articles, phrasebooks, specialized terminology, and even inspiring TED Talks. Over two years, our dedicated team of linguists diligently reviewed and perfected these data in Kazakh, Russian, English, and Turkish. Additionally, we have incorporated English language resources automatically translated into Kazakh, Russian, and Turkish. We also translated the corpus into Tatar and Uzbek and trained the model on these data.
The result is a state-of-the-art machine translation model that rivals the translation engines of industry giants like Google and Yandex in several standard metrics. We have compiled the results in the table below, showcasing the prowess of our Tilmash model alongside these top-notch translation systems.
Translation from Kazakh
English | Russian | Turkish | |||||||
Google Translate | Yandex Translate | Tilmash | Google Translate | Yandex Translate | Tilmash | Google Translate | Yandex Translate | Tilmash | |
BLEU | 0.32 | 0.29 | 0.32 | 0.26 | 0.26 | 0.27 | 0.21 | 0.13 | 0.16 |
ChrF | 0.63 | 0.61 | 0.63 | 0.59 | 0.60 | 0.60 | 0.58 | 0.52 | 0.55 |
Translation into Kazakh
English | Russian | Turkish | |||||||
Google Translate | Yandex Translate | Tilmash | Google Translate | Yandex Translate | Tilmash | Google Translate | Yandex Translate | Tilmash | |
BLEU | 0.27 | 0.18 | 0.21 | 0.21 | 0.2 | 0.20 | 0.17 | 0.13 | 0.15 |
ChrF | 0.63 | 0.58 | 0.60 | 0.60 | 0.60 | 0.60 | 0.56 | 0.53 | 0.55 |
The tables below illustrate a significant improvement in translation quality from and to Tatar and Uzbek when utilizing automatically translated data, compared to the NLLB base model.
From Tatar | Into Tatar | ||||||
BLEU | ChrF | BLEU | ChrF | ||||
Base | Tilmash | Base | Tilmash | Base | Tilmash | Base | Tilmash |
0.10 | 0.16 | 0.49 | 0.54 | 0.08 | 0.10 | 0.47 | 0.49 |
From Uzbek | Into Uzbek | ||||||
BLEU | ChrF | BLEU | ChrF | ||||
Base | Tilmash | Base | Tilmash | Base | Tilmash | Base | Tilmash |
0.09 | 0.15 | 0.44 | 0.53 | 0.07 | 0.12 | 0.49 | 0.56 |
We have also developed a demo presentation to give you the firsthand experience of our model.
Please keep your text under 800 characters.