BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generation

Autoregressive models excel in modeling sequential dependencies by enforcingcausal constraints, yet they struggle to capture complex bidirectional patternsdue to their unidirectional nature. In contrast, mask-based models leveragebidirectional context, enabling richer dependency modeling. However, they oftenassume token independence during prediction, which undermines the modeling ofsequential dependencies. Additionally, the corruption of sequences throughmasking or absorption can introduce unnatural distortions, complicating thelearning process. To address these issues, we propose BidirectionalAutoregressive Diffusion (BAD), a novel approach that unifies the strengths ofautoregressive and mask-based generative models. BAD utilizes apermutation-based corruption technique that preserves the natural sequencestructure while enforcing causal dependencies through randomized ordering,enabling the effective capture of both sequential and bidirectionalrelationships. Comprehensive experiments show that BAD outperformsautoregressive and mask-based models in text-to-motion generation, suggesting anovel pre-training strategy for sequence modeling. The codebase for BAD isavailable on https://github.com/RohollahHS/BAD.