Tutorial Included: Voice Cloning Model GPT-SoVITS, 5 Seconds of Speech Can Clone a Voice With a Similarity of 95%

"Speech" is the "early childhood education technology" for humans to come into contact with AI, and it is also one of the earliest AI technologies to leave the laboratory and enter thousands of households. Initially, people's research on intelligent speech mainly focused on speech recognition, that is, making machines understand human language.

The earliest computer-based speech recognition system was Audrey, developed by AT&T Bell Labs, which could recognize 10 English numerals. In 1988, Kai-Fu Lee implemented the first large-vocabulary speech recognition system Sphinx based on the Hidden Markov Model. In 1997, the world's first consumer-oriented continuous speech dictation system Dragon NaturallySpeaking was officially released. In 2009, Microsoft integrated speech functions into the Windows 7 operating system.

In 2011, the milestone product iPhone 4S was released. The birth of Siri brought intelligent voice from recognition to a new stage of "interaction".That same year, Google announced that it would begin testing Google Search internally and would roll out voice search on Google.com in the coming days.

The transition from hearing to speaking is also an important cornerstone for the prosperity and development of human-computer interaction. Today, from smart homes to smart driving, and then to robots, voice interaction has become smoother with the continuous upgrading of AI, and various applications are flourishing. On the technical side, major cloud computing vendors have open-sourced their AI voice capabilities in the form of APIs, and developers can further build applications based on them.

In recent years, as big models continue to be popular, open source capabilities directly at the model level have received more and more attention. Developers can train and fine-tune the models to further improve the deployment effect between the models and the applications they develop.

Not long ago,The founder of RVC (Retrieval based Voice Conversion) (GitHub account: RVC-Boss) has open-sourced a voice cloning project GPT-SoVITS.After it was launched, it gained great popularity. Many bloggers and developers customized various sailing lines with the voices of popular film and television characters and cartoon characters. The eye-catching effect and easy-to-use experience also attracted a group of netizens, adding fuel to its popularity. According to tests by major bloggers, only 5 seconds of voice samples are needed to obtain cloned voices with a similarity of 80%~95%.

Currently, the model deployment tutorial has been launched on the HyperAI official website. Click to start cloning:

https://hyper.ai/tutorials/29812

The editor asked the original character Paimon to make a cameo appearance as the queen in Legend of Zhen Huan.Paimeng becomes Empress Ulanara in seconds.

The AI voice cloning tutorial made by Jack-Cui, a popular up master on B station, is as follows:

https://www.bilibili.com/video/BV1WC411W79t/?spm_id_from=333.788&vd_source=5e54209e1f8c68b7f1dc3df8aabf856c

The step-by-step tutorial is as follows. Once you have 5 seconds of speech ready, you can start training your voice cloning model!

Data preparation

Currently, this tutorial has preset many classic character tones for everyone to experience. If you want to clone other tones, you need to prepare an audio file of the tone in MP3 format, preferably a single vocal (about 30 seconds). High-quality audio files can improve the realism of the cloned sound.

1. Click "Run this tutorial online" to jump to the OpenBayes platform.

2. Click "Clone" to copy the model. (This step can only experience the sound uploaded by Jack-Cui, the up master of B station)

3. If you want to customize the cloned sound, you need to create a new dataset. After going to "Dataset" in the left menu bar, click "Create New Dataset".

4. After filling in the "Dataset Name" and "Dataset Description" as required, click "Create Dataset".

5. After creation is complete, click "Upload New Version" in the upper right corner and upload the audio file you want to clone.

Demo Run

1. After the data preparation is completed, open "GPT-SoVITS Audio Synthesis Online Demo" in the "Public Tutorial" in the left menu bar, return to the tutorial page, and click "Clone" in the upper right corner to clone the tutorial into your own container.

2. Currently, the demo has bound audio data of Klee, Hua Fei, Zhen Huan, and Pang Ju. The number of bound data is full. You can delete unnecessary audio data and add your own data set.

3. After adding, click "Review and Execute".

4. After the page is redirected, click "Continue". RTX 4090 is recommended.

The editor has secured new user benefits for everyone! New users can register using the invitation link below to get 4 hours of RTX 4090 + 5 hours of free CPU computing time.

HyperAI exclusive invitation link (copy and open in browser to register):

https://openbayes.com/console/signup?r=Ada0322_QZy7

5. Wait for a while, and when the status changes to "Running", click "Open Workspace". It takes about 3-5 minutes to clone and start the container for the first time. If it is still in the "Allocating Resources" state after more than 10 minutes, try to stop and restart the container; if restarting still does not solve the problem, please contact the platform customer service on the official website.

6. After opening the workspace, click "run.ipynb" on the left, click the "Run" button in the menu bar, and click "Run All Cells".

7. Find "Running on public URL" and open the link.

8. In the "Dataset Address" module, fill in the address of the dataset you want to clone the sound this time. After selecting the audio data type, click "Start Training". When the output result shows "The model is starting prediction, please wait", return to "run.ipynb" and you will see "GPT training completed".

9. Open the "API Address" on the right. Please note that users must complete real-name authentication before using the API address access function.

Effect display

1. Select the trained model in "GPT Model List" and "SoVITS Model List", then enter the text in "Inference text", click "Start inference", wait a moment, and you can have fun!

At present, HyperAI's official website has launched hundreds of selected machine learning related tutorials, which are organized into the form of Jupyter notebooks.

Click the link to search for related tutorials and datasets:

https://hyper.ai/tutorials

HyperAI