The visually impaired and blind often face a range of socioeconomic problems that can make it difficult for them to live independently and participate fully in society. The advances in machine learning pave new venues to implement assistive devices for the visually impaired and blind. In this work, we combined image captioning and text-tospeech technologies to create an assistive device for the visually impaired and blind.
Our system can provide descriptive auditory feedback to the user in the Kazakh language for a scene acquired in real-time by a head-mounted camera. The image captioning model for the Kazakh language provided satisfactory results both for the quantitative metrics and subjective evaluation. Finally, experiments with a healthy blindfolded subject demonstrated the feasibility of our approach.