HyperAI

Latte World's First Open Source Vincent Video DiT

Paper | Project Page

Project Introduction

With the successful release of Sora, the video DiT model has received a lot of attention and discussion. Designing stable ultra-large-scale neural networks has always been a research focus in the field of visual generation. The success of DiT has made it possible to scale image generation. Latte (Latent Diffusion Transformer for Video Generation) is an innovative model for video generation that was open sourced in November 2023. As the world's first open source video DiT, Latte has achieved promising results.

This tutorial demonstrates the effect implementation of the Latte project.

Effect display

1

Tutorial

Customized text-to-video generation using Latte

1. Clone the container and run

2. Open the workspace and set the text prompt

Open the configuration file on the left home/Latte/configs/t2v/t2v_sample.yaml, double-click to open, modify the text under text_prompt, this article has given relevant examples, as shown below. ctrl+S  save.

2

3. Generate Video

Open a terminal and type:cd Latte/  Change directory,

Type in the terminal:bash sample/t2v.sh  Generate high-definition video, wait for the program to finish running, and then Latte/sample_videos  The generated results are in the directory, t2v_0000-.mp4 is the total video of the prompt text, and other .mp4 files are the videos generated by a single prompt.

Note: The generated video cannot be viewed directly in the container. You need to right-click the file to download the video to your local computer for viewing.

Other code information

Latte inference code

Latte can get four models by training on four standard video generation datasets (FaceForensics, SkyTimelapse, UCF101 and Taichi-HD). Each model will generate a video of the corresponding scene. The following is an explanation of the operation: First enter the project, open the terminal and enter:cd Latte/

1. FaceForensics: Face Detection from Synthetic Images

Type in the terminal:bash sample/ffs.sh

To generate a face, after the program is finished, Latte/test_ffs Check the generated results in the directory.

Note: Each generated result will overwrite the previous result.

2. SkyTimelapse: Photographic sky images

Type in the terminal:bash sample/sky.sh

To generate the sky, after the program ends, use the left Latte/test_sky Generate results in the directory and download them to your local computer for viewing.

3. UCF101: Action Recognition from Realistic Action Videos

Type in the terminal:bash sample/ucf101.sh

To generate real action, after the program is finished, Latte/test_UCF101  Generate results in the directory and download them to your local computer for viewing.

4. Taichi-HD; High-definition video generation

Type in the terminal:bash sample/taichi.sh

To generate high-definition video, after the program is completed, Latte/test_Taichi  Generate results in the directory and download them to your local computer for viewing.