LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Animation colorization is a crucial part of real animation industryproduction. Long animation colorization has high labor costs. Therefore,automated long animation colorization based on the video generation model hassignificant research value. Existing studies are limited to short-termcolorization. These studies adopt a local paradigm, fusing overlapping featuresto achieve smooth transitions between local segments. However, the localparadigm neglects global information, failing to maintain long-term colorconsistency. In this study, we argue that ideal long-term color consistency canbe achieved through a dynamic global-local paradigm, i.e., dynamicallyextracting global color-consistent features relevant to the current generation.Specifically, we propose LongAnimation, a novel framework, which mainlyincludes a SketchDiT, a Dynamic Global-Local Memory (DGLM), and a ColorConsistency Reward. The SketchDiT captures hybrid reference features to supportthe DGLM module. The DGLM module employs a long video understanding model todynamically compress global historical features and adaptively fuse them withthe current generation features. To refine the color consistency, we introducea Color Consistency Reward. During inference, we propose a color consistencyfusion to smooth the video segment transition. Extensive experiments on bothshort-term (14 frames) and long-term (average 500 frames) animations show theeffectiveness of LongAnimation in maintaining short-term and long-term colorconsistency for open-domain animation colorization task. The code can be foundat https://cn-makers.github.io/long_animation_web/.