Spatio-Temporal Covariance Descriptors for Action and Gesture Recognition

We propose a new action and gesture recognition method based onspatio-temporal covariance descriptors and a weighted Riemannian localitypreserving projection approach that takes into account the curved space formedby the descriptors. The weighted projection is then exploited during boostingto create a final multiclass classification algorithm that employs the mostuseful spatio-temporal regions. We also show how the descriptors can becomputed quickly through the use of integral video representations. Experimentson the UCF sport, CK+ facial expression and Cambridge hand gesture datasetsindicate superior performance of the proposed method compared to several recentstate-of-the-art techniques. The proposed method is robust and does not requireadditional processing of the videos, such as foreground detection,interest-point detection or tracking.