GeneFace++: A universal and stable real-time audio-driven 3D talking face generation technique

This tutorial has built the relevant environment for you. You only need to follow the steps to generate a personalized digital human.

1. Project Introduction

GeneFace can drive 3D face synthesis lip-reading speech videos through audio. It is a general and high-fidelity NeRF-based speaking face generation method that can generate natural results corresponding to various out-of-domain audio. Specifically, GeneFace learns a variational motion generator on a large lip-reading corpus and introduces a domain-adaptive post-processing network to calibrate the results. A large number of experiments have shown that compared with previous methods, GeneFace's method achieves more general and high-fidelity speaking face generation.

2. Demo running method

Video tutorial reference:[Zero foundation + nanny level] Revealing the whole process of making cloned digital people, GeneFace++ is enabled with one click without hardware requirements, a digital avatar tutorial that everyone can learn

1. Enter the website

After starting the container, click the API address to enter the Web interface

2. Shut down the website

Type in the terminal running the website Ctrl + C The website will be terminated. If the terminal page is closed, select the second item on the left side of the workspace tab to retrieve the closed terminal page.

If a terminal is running a program, other commands cannot be run. You can choose to terminate the current program or open a new terminal to run other commands.

3. Training personalized model operation method

The following is a Markdown document that has been organized to ensure that the content is neat and easy to understand, while keeping the image and link positions unchanged.

1. Preparation

Prepare your own digital human video. The video width and height are recommended to be square, because the trained video will be automatically cropped to 512 * 512 If the video is of other sizes, black edges may appear after cropping. If the black edges are too large, the training effect will be affected. The video format needs to be mp4, drag the video to the upload area in the lower left corner of the webpage to upload it.

Note:

The video title should not contain Chinese characters.

The video background must be clean and free of unnecessary elements, preferably a solid color background. If there are too many or cluttered background elements, background extraction may fail.

The face in the video must be clear and should occupy the main part of the picture. It is recommended to use close-up pictures above the shoulders and not half-body videos, otherwise the face will be blurred.

The sample screen is as follows⬇️
meimei

Training parameter recommendations

Choose a suitable number of training steps, more than 50,000 steps are recommended. The following is a reference time:

Sample videos provided by the project (May):
The video is about 4 minutes long, and it takes about 1 hour to create the dataset. It takes about 1 hour to train a single model for 50,000 steps.
The overall training requires the creation of two models, which takes about 2-3 hours in total.
Video length recommendations:
The video length should be 3-5 minutes. If the video is too short, it will be difficult to achieve good results even if you train for a long time; if the video is too long, it will prolong the training time.

Training progress

You can execute the following command in the terminal to view the training progress bash tail -f app.log When it says This Video has Trained! Training is complete!

Model training results

After the training is completed, you can torso model ckpt path or directory Find the two folders related to the video (located in motion2video_nerf directory). Select videoid_torso The models in the folder are ready to use.

Note:

The first two models on the right must be selected, otherwise the default model will be used.

The first model is a sound-to-gesture conversion model. audio2motion_vae The model in .

If the head model is selected at torso, an error may be reported:
Inference ERROR: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!!

Using the trained model

If you already have a trained model, you can place it in a similar folder structure and use it directly.

2. Retraining

The entire process of training the model can be summarized as: generating a dataset, training the head model, and training the shoulder model. To avoid repeating part of the process or starting over from the beginning due to unexpected interruptions, the code will record the completed steps when each step is completed. For videos with the same name, the steps that have been completed will be skipped and the unfinished steps from the last time will be performed directly.

If you want to retrain the model with a new number of steps, just delete the two trained folders. You can run the following command to delete the model folder. If you only want to retrain the shoulder model, just delete the torso folder.

rm -r /openbayes/home/geneface_out/checkpoints/motion2video_nerf/把这里换成你的视频名称_head
rm -r /openbayes/home/geneface_out/checkpoints/motion2video_nerf/把这里换成你的视频名称_torso

4. Cleaning Data

If you do not plan to retrain, you can clean up the data set. Run the following script to clean up all the data in the data set to save space. Because it takes a long time to make a data set, you must think carefully before deleting it. Do not enter the video name with a suffix mp4 .

/output/clear_video_data.sh 你的视频名称

If you want to completely clear all the data of a video, you can run the following script:

/output/clear_video_data.sh/output/clear_all_data.sh 你的视频名称

5. Introduction to folder structure

When you train a new video, the following files and folders will be generated under geneface_out, where VIDEO_ID is the name of your video.

- geneface_out

-- binary
--- videos
---- {VIDEO_ID}
----- trainval_dataset.npy # 制作好的数据集，推理和训练的时候都用它

--processed #生成数据集过程中存放数据的文件夹
--- videos
---- {VIDEO_ID} # 从视频中提取的所有数据都在这，制作好数据集（即上面的 trainval_dataset.npy）后就没用了

-- raw
--- videos
---- {VIDEO_ID}.mp4 # 存放着你上传的视频的 512*512 版本，制作好数据集后就不需要了

-- view_check # 用于访问 checkpoints 的软链接
-- checkpoints 
--- motion2video_nerf # 存放训练好的模型数据
---- {VIDEO_ID}_head
---- {VIDEO_ID}_torso