HyperAIHyperAI
vor 2 Monaten

TM2T: Stochastische und tokenbasierte Modellierung zur gegenseitigen Generierung von 3D-Menschbewegungen und Texten

Guo, Chuan ; Zuo, Xinxin ; Wang, Sen ; Cheng, Li
TM2T: Stochastische und tokenbasierte Modellierung zur gegenseitigen Generierung von 3D-Menschbewegungen und Texten
Abstract

Inspired by the strong connections between vision and language, two closely intertwined human sensory and communication modalities, our paper aims to investigate the generation of 3D full-body human motions from texts, as well as its reciprocal task, referred to as text2motion and motion2text, respectively. To address existing challenges, particularly enabling the generation of multiple distinct motions from the same text and avoiding the undesirable production of trivial motionless pose sequences, we propose the use of motion tokens, a discrete and compact motion representation. This provides a common basis for considering both motion and text signals, represented as motion tokens and text tokens, respectively. Furthermore, our motion2text module is integrated into the inverse alignment process of our text2motion training pipeline, where a significant deviation of synthesized text from the input text results in a high training loss; empirically, this has been shown to effectively enhance performance. Finally, the mappings between the two modalities of motions and texts are facilitated by adapting a neural model for machine translation (NMT) to our context. This autoregressive modeling of the distribution over discrete motion tokens further enables non-deterministic generation of pose sequences of varying lengths from an input text. Our approach is flexible and can be applied to both text2motion and motion2text tasks. Empirical evaluations on two benchmark datasets demonstrate that our method outperforms a variety of state-of-the-art approaches in both tasks. Projektseite: https://ericguo5513.github.io/TM2T/

TM2T: Stochastische und tokenbasierte Modellierung zur gegenseitigen Generierung von 3D-Menschbewegungen und Texten | Neueste Forschungsarbeiten | HyperAI