Autonomous Human Activity Classification from Ego-vision Camera and Accelerometer Data

There has been significant amount of research work on human activityclassification relying either on Inertial Measurement Unit (IMU) data or datafrom static cameras providing a third-person view. Using only IMU data limitsthe variety and complexity of the activities that can be detected. Forinstance, the sitting activity can be detected by IMU data, but it cannot bedetermined whether the subject has sat on a chair or a sofa, or where thesubject is. To perform fine-grained activity classification from egocentricvideos, and to distinguish between activities that cannot be differentiated byonly IMU data, we present an autonomous and robust method using data from bothego-vision cameras and IMUs. In contrast to convolutional neural network-basedapproaches, we propose to employ capsule networks to obtain features fromegocentric video data. Moreover, Convolutional Long Short Term Memory frameworkis employed both on egocentric videos and IMU data to capture temporal aspectof actions. We also propose a genetic algorithm-based approach to autonomouslyand systematically set various network parameters, rather than using manualsettings. Experiments have been performed to perform 9- and 26-label activityclassification, and the proposed method, using autonomously set networkparameters, has provided very promising results, achieving overall accuraciesof 86.6\% and 77.2\%, respectively. The proposed approach combining bothmodalities also provides increased accuracy compared to using only egovisiondata and only IMU data.