AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models

We present AnimaX, a feed-forward 3D animation framework that bridges themotion priors of video diffusion models with the controllable structure ofskeleton-based animation. Traditional motion synthesis methods are eitherrestricted to fixed skeletal topologies or require costly optimization inhigh-dimensional deformation spaces. In contrast, AnimaX effectively transfersvideo-based motion knowledge to the 3D domain, supporting diverse articulatedmeshes with arbitrary skeletons. Our method represents 3D motion as multi-view,multi-frame 2D pose maps, and enables joint video-pose diffusion conditioned ontemplate renderings and a textual motion prompt. We introduce shared positionalencodings and modality-aware embeddings to ensure spatial-temporal alignmentbetween video and pose sequences, effectively transferring video priors tomotion generation task. The resulting multi-view pose sequences aretriangulated into 3D joint positions and converted into mesh animation viainverse kinematics. Trained on a newly curated dataset of 160,000 riggedsequences, AnimaX achieves state-of-the-art results on VBench ingeneralization, motion fidelity, and efficiency, offering a scalable solutionfor category-agnostic 3D animation. Project page:https://anima-x.github.io/{https://anima-x.github.io/}.