Story2Board: A Training-Free Approach for Expressive Storyboard Generation

We present Story2Board, a training-free framework for expressive storyboardgeneration from natural language. Existing methods narrowly focus on subjectidentity, overlooking key aspects of visual storytelling such as spatialcomposition, background evolution, and narrative pacing. To address this, weintroduce a lightweight consistency framework composed of two components:Latent Panel Anchoring, which preserves a shared character reference acrosspanels, and Reciprocal Attention Value Mixing, which softly blends visualfeatures between token pairs with strong reciprocal attention. Together, thesemechanisms enhance coherence without architectural changes or fine-tuning,enabling state-of-the-art diffusion models to generate visually diverse yetconsistent storyboards. To structure generation, we use an off-the-shelflanguage model to convert free-form stories into grounded panel-level prompts.To evaluate, we propose the Rich Storyboard Benchmark, a suite of open-domainnarratives designed to assess layout diversity and background-groundedstorytelling, in addition to consistency. We also introduce a new SceneDiversity metric that quantifies spatial and pose variation across storyboards.Our qualitative and quantitative results, as well as a user study, show thatStory2Board produces more dynamic, coherent, and narratively engagingstoryboards than existing baselines.