Command Palette
Search for a command to run...
Paper2Video: Automatic Video Generation from Scientific Papers
Zeyu Zhu Kevin Qinghong Lin Mike Zheng Shou

Abstract
Academic presentation videos have become an essential medium for researchcommunication, yet producing them remains highly labor-intensive, oftenrequiring hours of slide design, recording, and editing for a short 2 to 10minutes video. Unlike natural video, presentation video generation involvesdistinctive challenges: inputs from research papers, dense multi-modalinformation (text, figures, tables), and the need to coordinate multiplealigned channels such as slides, subtitles, speech, and human talker. Toaddress these challenges, we introduce PaperTalker, the first benchmark of 101research papers paired with author-created presentation videos, slides, andspeaker metadata. We further design four tailored evaluation metrics--MetaSimilarity, PresentArena, PresentQuiz, and IP Memory--to measure how videosconvey the paper's information to the audience. Building on this foundation, wepropose PaperTalker, the first multi-agent framework for academic presentationvideo generation. It integrates slide generation with effective layoutrefinement by a novel effective tree search visual choice, cursor grounding,subtitling, speech synthesis, and talking-head rendering, while parallelizingslide-wise generation for efficiency. Experiments on Paper2Video demonstratethat the presentation videos produced by our approach are more faithful andinformative than existing baselines, establishing a practical step towardautomated and ready-to-use academic video generation. Our dataset, agent, andcode are available at https://github.com/showlab/Paper2Video.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.