Projects

Tilmash

Welcome to ISSAI’s Tilmash project, enabling two-way machine translation for six languages — Kazakh, Russian, English, Turkish, Tatar, and Uzbek.

Our translation model was fine-tuned using Facebook’s NLLB model, designed to handle translation challenges across 202 languages.

Our model was trained using an array of data sources, including official government websites (e.g., the official site of the President of the Republic of Kazakhstan and the State of the Nation Address), news articles, phrasebooks, specialized terminology, and even inspiring TED Talks. Over two years, our dedicated team of linguists diligently reviewed and perfected these data in Kazakh, Russian, English, and Turkish. Additionally, we have incorporated English language resources automatically translated into Kazakh, Russian, and Turkish. We also translated the corpus into Tatar and Uzbek and trained the model on these data.

The result is a state-of-the-art machine translation model that rivals the translation engines of industry giants like Google and Yandex in several standard metrics. We have compiled the results in the table below, showcasing the prowess of our Tilmash model alongside these top-notch translation systems.

Translation from Kazakh

  English Russian Turkish
  Google Translate Yandex Translate Tilmash Google Translate Yandex Translate Tilmash Google Translate Yandex Translate Tilmash
BLEU 0.32 0.29 0.32 0.26 0.26 0.27 0.21 0.13 0.16
ChrF 0.63 0.61 0.63 0.59 0.60 0.60 0.58 0.52 0.55

Translation into Kazakh

  English Russian Turkish
  Google Translate Yandex Translate Tilmash Google Translate Yandex Translate Tilmash Google Translate Yandex Translate Tilmash
BLEU 0.27 0.18 0.21 0.21 0.2 0.20 0.17 0.13 0.15
ChrF 0.63 0.58 0.60 0.60 0.60 0.60 0.56 0.53 0.55

The tables below illustrate a significant improvement in translation quality from and to Tatar and Uzbek when utilizing automatically translated data, compared to the NLLB base model.

From Tatar Into Tatar
BLEU ChrF BLEU ChrF
Base Tilmash Base Tilmash Base Tilmash Base Tilmash
0.10 0.16 0.49 0.54 0.08 0.10 0.47 0.49
From Uzbek Into Uzbek
BLEU ChrF BLEU ChrF
Base Tilmash Base Tilmash Base Tilmash Base Tilmash
0.09 0.15 0.44 0.53 0.07 0.12 0.49 0.56

We have also developed a demo presentation to give you the firsthand experience of our model.

  1. Choose your source language (the one you are translating from).
  2. Select your target language (the one you are translating to).
  3. Type or paste your text into the left-hand field.
  4. Hit the “Translate” button. Your translated text will appear in the right-hand field.

Please keep your text under 800 characters.

swap
800 / 800