Latte World's First Open Source Vincent Video DiT
Paper | Project Page
Project Introduction
With the successful release of Sora, the video DiT model has received a lot of attention and discussion. Designing stable ultra-large-scale neural networks has always been a research focus in the field of visual generation. The success of DiT has made it possible to scale image generation. Latte (Latent Diffusion Transformer for Video Generation) is an innovative model for video generation that was open sourced in November 2023. As the world's first open source video DiT, Latte has achieved promising results.
This tutorial demonstrates the effect implementation of the Latte project.
Effect display

Tutorial
Customized text-to-video generation using Latte
1. Clone the container and run
2. Open the workspace and set the text prompt
Open the configuration file on the left home/Latte/configs/t2v/t2v_sample.yaml
, double-click to open, modify the text under text_prompt, this article has given relevant examples, as shown below. ctrl+S
save.

3. Generate Video
Open a terminal and type:cd Latte/
Change directory,
Type in the terminal:bash sample/t2v.sh
Generate high-definition video, wait for the program to finish running, and then Latte/sample_videos
The generated results are in the directory, t2v_0000-.mp4 is the total video of the prompt text, and other .mp4 files are the videos generated by a single prompt.
Note: The generated video cannot be viewed directly in the container. You need to right-click the file to download the video to your local computer for viewing.
Other code information
Latte inference code
Latte can get four models by training on four standard video generation datasets (FaceForensics, SkyTimelapse, UCF101 and Taichi-HD). Each model will generate a video of the corresponding scene. The following is an explanation of the operation: First enter the project, open the terminal and enter:cd Latte/
1. FaceForensics: Face Detection from Synthetic Images
Type in the terminal:bash sample/ffs.sh
To generate a face, after the program is finished, Latte/test_ffs
Check the generated results in the directory.
Note: Each generated result will overwrite the previous result.
2. SkyTimelapse: Photographic sky images
Type in the terminal:bash sample/sky.sh
To generate the sky, after the program ends, use the left Latte/test_sky
Generate results in the directory and download them to your local computer for viewing.
3. UCF101: Action Recognition from Realistic Action Videos
Type in the terminal:bash sample/ucf101.sh
To generate real action, after the program is finished, Latte/test_UCF101
Generate results in the directory and download them to your local computer for viewing.
4. Taichi-HD; High-definition video generation
Type in the terminal:bash sample/taichi.sh
To generate high-definition video, after the program is completed, Latte/test_Taichi
Generate results in the directory and download them to your local computer for viewing.