The dataset contains thermal images acquired in a controlled indoor (c-indoor), semi-controlled indoor (s-indoor), and uncontrolled outdoor (u-outdoor) environments. The c-indoor dataset was constructed using our previously published SpeakingFaces dataset. The s-indoor and u-outdoor datasets were collected using the same FLIR T540 thermal camera with a resolution of 464×348 pixels, a wave-band of 7.5–14 μm, the field of view 24, and an iron color palette. The dataset was manually annotated with face bounding boxes and five point facial landmarks (the center of the right eye, the center of the left eye, the tip of the nose, the right outer corner of the mouth, the left outer corner of the mouth).
Examples of annotated images: