Adaptive Feature Processing for Robust Human Activity Recognition on a Novel Multi-Modal Dataset

Human Activity Recognition (HAR) is a key building block of many emergingapplications such as intelligent mobility, sports analytics, ambient-assistedliving and human-robot interaction. With robust HAR, systems will become morehuman-aware, leading towards much safer and empathetic autonomous systems.While human pose detection has made significant progress with the dawn of deepconvolutional neural networks (CNNs), the state-of-the-art research has almostexclusively focused on a single sensing modality, especially video. However, insafety critical applications it is imperative to utilize multiple sensormodalities for robust operation. To exploit the benefits of state-of-the-artmachine learning techniques for HAR, it is extremely important to havemultimodal datasets. In this paper, we present a novel, multi-modal sensordataset that encompasses nine indoor activities, performed by 16 participants,and captured by four types of sensors that are commonly used in indoorapplications and autonomous vehicles. This multimodal dataset is the first ofits kind to be made openly available and can be exploited for many applicationsthat require HAR, including sports analytics, healthcare assistance and indoorintelligent mobility. We propose a novel data preprocessing algorithm to enableadaptive feature extraction from the dataset to be utilized by differentmachine learning algorithms. Through rigorous experimental evaluations, thispaper reviews the performance of machine learning approaches to posturerecognition, and analyses the robustness of the algorithms. When performing HARwith the RGB-Depth data from our new dataset, machine learning algorithms suchas a deep neural network reached a mean accuracy of up to 96.8% forclassification across all stationary and dynamic activities