HyperAI

AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

Shi, Haoyuan ; Li, Yunxin ; Chen, Xinyu ; Wang, Longyue ; Hu, Baotian ; Zhang, Min
Release Date: 6/15/2025
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven
  Clip Generation
Abstract

Despite rapid advancements in video generation models, generating coherentstorytelling videos that span multiple scenes and characters remainschallenging. Current methods often rigidly convert pre-generated keyframes intofixed-length clips, resulting in disjointed narratives and pacing issues.Furthermore, the inherent instability of video generation models means thateven a single low-quality clip can significantly degrade the entire outputanimation's logical coherence and visual continuity. To overcome theseobstacles, we introduce AniMaker, a multi-agent framework enabling efficientmulti-candidate clip generation and storytelling-aware clip selection, thuscreating globally consistent and story-coherent animation solely from textinput. The framework is structured around specialized agents, including theDirector Agent for storyboard generation, the Photography Agent for video clipgeneration, the Reviewer Agent for evaluation, and the Post-Production Agentfor editing and voiceover. Central to AniMaker's approach are two key technicalcomponents: MCTS-Gen in Photography Agent, an efficient Monte Carlo Tree Search(MCTS)-inspired strategy that intelligently navigates the candidate space togenerate high-potential clips while optimizing resource usage; and AniEval inReviewer Agent, the first framework specifically designed for multi-shotanimation evaluation, which assesses critical aspects such as story-levelconsistency, action completion, and animation-specific features by consideringeach clip in the context of its preceding and succeeding clips. Experimentsdemonstrate that AniMaker achieves superior quality as measured by popularmetrics including VBench and our proposed AniEval framework, whilesignificantly improving the efficiency of multi-candidate generation, pushingAI-generated storytelling animation closer to production standards.