Interaction Relational Network for Mutual Action Recognition

Person-person mutual action recognition (also referred to as interactionrecognition) is an important research branch of human activity analysis.Current solutions in the field -- mainly dominated by CNNs, GCNs and LSTMs --often consist of complicated architectures and mechanisms to embed therelationships between the two persons on the architecture itself, to ensure theinteraction patterns can be properly learned. Our main contribution with thiswork is by proposing a simpler yet very powerful architecture, namedInteraction Relational Network, which utilizes minimal prior knowledge aboutthe structure of the human body. We drive the network to identify by itself howto relate the body parts from the individuals interacting. In order to betterrepresent the interaction, we define two different relationships, leading tospecialized architectures and models for each. These multiple relationshipmodels will then be fused into a single and special architecture, in order toleverage both streams of information for further enhancing the relationalreasoning capability. Furthermore we define important structured pair-wiseoperations to extract meaningful extra information from each pair of joints --distance and motion. Ultimately, with the coupling of an LSTM, our IRN iscapable of paramount sequential relational reasoning. These importantextensions we made to our network can also be valuable to other problems thatrequire sophisticated relational reasoning. Our solution is able to achievestate-of-the-art performance on the traditional interaction recognitiondatasets SBU and UT, and also on the mutual actions from the large-scaledataset NTU RGB+D. Furthermore, it obtains competitive performance in the NTURGB+D 120 dataset interactions subset.