Publication

Multi-Modal Vision and Language Models for Real-Time Emergency Response

Recent advancements in ambient assisted living (AAL) technologies leverage machine learning (ML) and deep learning (DL) for improved emergency response and preventive care. This research introduces a multi-modal system with an advanced vision-language model (VLM) to enhance detection capabilities in AAL settings. Using DL, the system interprets scenes to generate captions, answer visual questions, and facilitate commonsense reasoning. An interactive chatbot with a large language model (LLM) and text-to-speech and speech-to-text capabilities enables real-time assessments of abnormal behavior. The system uses prompt engineering to refine anomaly detection without extensive retraining. It autonomously dispatches ambulances and generates alerts. Qualitative analysis confirms high usability among study participants, while quantitative assessments show a detection accuracy of 93.44 %, a recall rate of 95 %, and a specificity rate of 88.88 %. User interactions further enhance accuracy to 100%. This multi-modal system improves emergency recognition and response, providing caregivers with actionable insights in real time.

Information about the publication

Authors:

Adil Zhiyenbayev, Rakhat Abdrakhmanov, Huseyin Atakan Varol, Adnan Yazici
PDF