Publication

Real-Time Multispectral Human Pose Estimation

Human pose estimation (HPE) is essential in human motion analysis. Nowadays, numerous RGB datasets are available to train deep learning-based HPE models. However, poor lighting and privacy issues pose challenges in the visible domain. Thermal cameras can address these issues as they are illumination invariant. However, few annotated thermal HPE datasets exist for training deep learning models. Also, HPE models trained for either the thermal or visual domain, with little exploration of cross-domain knowledge transfer. In this work, we trained the YOLO11-pose models for multispectral human pose estimation by fusing COCO and OpenThermalPose2 datasets. The results show that our models achieve high accuracy in both domains, even outperforming models specialized for each domain. The largest model, YOLO11x-pose, achieved an AP50:95pose of 95.23% on the test set of OpenThermalPose2, establishing a new benchmark for this dataset. Also, the model achieved an AP50:95pose of 69.89% on the COCO validation set, slightly improving the results of the original YOLO11x-pose model. We optimized the models and deployed them on an NVIDIA Jetson AGX Orin 64GB. The models in TensorRT format with half-precision floating-point achieved the best balance of speed and accuracy, making them suitable for real-time applications. We have made the pre-trained models publicly available at https://github.com/IS2AI/multispectral-motion-analysis to support research in this area.

Information about the publication

Authors:

Askat Kuzdeuov; Huseyin Atakan Varol
PDF