Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions

Human-motion generation is a long-standing challenging task due to therequirement of accurately modeling complex and diverse dynamic patterns. Mostexisting methods adopt sequence models such as RNN to directly modeltransitions in the original action space. Due to high dimensionality andpotential noise, such modeling of action transitions is particularlychallenging. In this paper, we focus on skeleton-based action generation andpropose to model smooth and diverse transitions on a latent space of actionsequences with much lower dimensionality. Conditioned on a latent sequence,actions are generated by a frame-wise decoder shared by all latentaction-poses. Specifically, an implicit RNN is defined to model smooth latentsequences, whose randomness (diversity) is controlled by noise from the input.Different from standard action-prediction methods, our model can generateaction sequences from pure noise without any conditional action poses.Remarkably, it can also generate unseen actions from mixed classes duringtraining. Our model is learned with a bi-directional generative-adversarial-netframework, which not only can generate diverse action sequences of a particularclass or mix classes, but also learns to classify action sequences within thesame model. Experimental results show the superiority of our method in bothdiverse action-sequence generation and classification, relative to existingmethods.