Online Tutorial | Real Evaluation of 3 Voice Cloning Models, GPT-SoVITS Accurately Grasps the Characteristics of "Shiji Niangniang"

a year ago

The box office of the Spring Festival movie "Nezha 2" has been soaring, and has now exceeded 12 billion, becoming the first Chinese film to reach the 10 billion mark, and has successfully entered the top 10 of the global box office list. In the film, the voice actors gave the characters a fresh vitality with their smart voices, from Nezha's "smoky voice" to Taiyi Zhenren's Sichuan dialect, to Shiji Niangniang's smartness, which triggered widespread discussion among the public and brought the behind-the-scenes dubbing art to the forefront.

Speaking of the charm of dubbing art, the Bai Jingjing skin of Mi Yue in "Honor of Kings" is a perfect example. The official invited Wang Huijun, the original voice actor of Bai Jingjing in the movie "A Chinese Odyssey", to voice her again. "You and I must believe that letting go is also a kind of God's will", the familiar lines sounded, and the youthful reluctance of many people was instantly awakened, and players "generously donated" to this sentiment.

Nowadays, voice cloning technology is developing rapidly. Relying on advanced voice cloning models, ordinary people can also transcend time and space, reproduce the unique voice of their favorite characters with one click, and easily satisfy their "dubbing addiction"!Three mainstream open source models, GPT-SoVITS, Fish Speech v1.4, and F5-E2 TTS, stand out.With their respective unique advantages, they play a key role in different application scenarios. Whether it is film and television creation, audio content production, or daily fun dubbing, they can be found.

The "Tutorial" section of HyperAI's official website is now online:

* GPT-SoVITS audio synthesis online demo:

https://hyper.ai/cn/tutorials/29812

* Fish Speech v1.4 Voice Cloning-Text to Speech Tool Demo:

https://hyper.ai/cn/tutorials/34680

* F5-E2 TTS clones any sound in just 3 seconds:

https://hyper.ai/cn/tutorials/35468

Today, I will give you a detailed introduction to these three sound cloning open source models, and use the same original audio and prompt to help you evaluate the actual usage effects!

GPT-SoVITS Audio Synthesis

* Release time:2022

* Issuing Agency:B station up master Huaer Buku

* One-click deployment:

https://hyper.ai/cn/tutorials/29812

The model uses SoVITS+Transformer speech coding technology and caused a sensation in the AI speech synthesis circle as soon as it was launched. Its high-fidelity speech synthesis effect is unique, and even with only a 5-second sound sample, it can achieve zero-sample text-to-speech (TTS) conversion.

Taking the voice of Shiji Niangniang in the movie Nezha as an example, using GPT-SoVITS, we only need to collect an audio sample of Shiji Niangniang’s classic lines in the movie as a sample to accurately reproduce her lovely, lively and powerful voice.

Fish Speech v1.4 Voice Cloning

* Release time:2024

* Issuing Agency:Fish Audio Team

* One-click deployment:

https://hyper.ai/cn/tutorials/34680

The model has been trained with about 150,000 hours of data and is proficient in Chinese, Japanese and English. Its language processing ability is close to that of humans, and its voice expression is rich and varied. Users can freely adjust the timbre, pitch, and speaking speed to easily create their own voice to meet everyone's personalized needs for character voices in different creative scenarios.

F5-E2 TTS clones any sound in just 3 seconds

* Release time:2024

* Issuing Agency:Shanghai Jiao Tong University, University of Cambridge and Geely Automobile Research Institute (Ningbo) Co., Ltd.

* One-click deployment:

https://hyper.ai/cn/tutorials/35468

F5 TTS uses a non-autoregressive generation method based on stream matching, combined with the Diffusion Transformer (DiT) technology, to quickly generate natural, fluent, and faithful speech to the original text through zero-shot learning without additional supervision. The core of E2 TTS lies in its completely non-autoregressive feature. It can generate the entire speech sequence at once without the need for step-by-step generation, which significantly improves the generation speed and maintains high-quality speech output, achieving multi-tone hybrid cloning in 3 seconds.

This model supports 3 functions:

* Single-person voice generation (Batched TTS): Generate text based on uploaded audio.

* Podcast Generation:Simulate a two-person conversation based on two-person audio.

* Multiple Speech-Type Generation:Audios with different emotions can be generated based on the audios of the same speaker with different emotions.

The above is the review of the sound cloning model we prepared for you. If you are interested, come and experience it for yourself!

Online Tutorial | Real Evaluation of 3 Voice Cloning Models, GPT-SoVITS Accurately Grasps the Characteristics of "Shiji Niangniang"

a year ago

Information

Artificial Intelligence

The "Tutorial" section of HyperAI's official website is now online:

* GPT-SoVITS audio synthesis online demo:

https://hyper.ai/cn/tutorials/29812

* Fish Speech v1.4 Voice Cloning-Text to Speech Tool Demo:

https://hyper.ai/cn/tutorials/34680

* F5-E2 TTS clones any sound in just 3 seconds:

https://hyper.ai/cn/tutorials/35468

Today, I will give you a detailed introduction to these three sound cloning open source models, and use the same original audio and prompt to help you evaluate the actual usage effects!

GPT-SoVITS Audio Synthesis

* Release time:2022

* Issuing Agency:B station up master Huaer Buku

* One-click deployment:

https://hyper.ai/cn/tutorials/29812

Fish Speech v1.4 Voice Cloning

* Release time:2024

* Issuing Agency:Fish Audio Team

* One-click deployment:

https://hyper.ai/cn/tutorials/34680

F5-E2 TTS clones any sound in just 3 seconds

* Release time:2024

* Issuing Agency:Shanghai Jiao Tong University, University of Cambridge and Geely Automobile Research Institute (Ningbo) Co., Ltd.

* One-click deployment:

https://hyper.ai/cn/tutorials/35468

This model supports 3 functions:

* Single-person voice generation (Batched TTS): Generate text based on uploaded audio.

* Podcast Generation:Simulate a two-person conversation based on two-person audio.

* Multiple Speech-Type Generation:Audios with different emotions can be generated based on the audios of the same speaker with different emotions.

The above is the review of the sound cloning model we prepared for you. If you are interested, come and experience it for yourself!

Command Palette

Online Tutorial | Real Evaluation of 3 Voice Cloning Models, GPT-SoVITS Accurately Grasps the Characteristics of "Shiji Niangniang"

Command Palette

Online Tutorial | Real Evaluation of 3 Voice Cloning Models, GPT-SoVITS Accurately Grasps the Characteristics of "Shiji Niangniang"

Related News

TRELLIS.2: Employs O-Voxel Technology for Efficient Generation of Complex 3D Geometry and Materials; Patient Churn Prediction Dataset: Helps Identify Patients at Risk of attrition.

GPT-5 Leads Across the Board; OpenAI Releases FrontierScience, Using a Dual Approach of "inference + Research" to Test the Capabilities of large-scale models.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

LightOnOCR-2-1B: High-precision end-to-end OCR Based on RLVR Training; Google Streetview National Street View Images: An open-source Panoramic Image Library Based on world-class Geomapping technology.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Open Source, Best Value! Mistral AI Releases the Ministral 3 Series of Models, Integrating Multimodal Understanding and Intelligent Execution Capabilities; From high-dynamic Dance to Everyday Behavior, the X-Dance Dataset Unlocks multi-dimensional Testing for Human Animation generation.

AI Paper Weekly Roundup | Attention Mechanism / NVIDIA VLA Model / TTS Model / Graph Neural Networks... A Comprehensive Overview of the Latest AI Developments

Unveiling AI Inference: OpenAI's Sparse Model Makes Neural Networks Transparent for the First Time; Calories Burnt Prediction: Injecting Precise Energy Data Into Fitness Models

Online Tutorial | Microsoft Open Sources VibeVoice, Enabling 90 Minutes of Natural Dialogue Between 4 Roles

Command Palette

Online Tutorial | Real Evaluation of 3 Voice Cloning Models, GPT-SoVITS Accurately Grasps the Characteristics of "Shiji Niangniang"

Related News

TRELLIS.2: Employs O-Voxel Technology for Efficient Generation of Complex 3D Geometry and Materials; Patient Churn Prediction Dataset: Helps Identify Patients at Risk of attrition.

GPT-5 Leads Across the Board; OpenAI Releases FrontierScience, Using a Dual Approach of "inference + Research" to Test the Capabilities of large-scale models.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

LightOnOCR-2-1B: High-precision end-to-end OCR Based on RLVR Training; Google Streetview National Street View Images: An open-source Panoramic Image Library Based on world-class Geomapping technology.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Open Source, Best Value! Mistral AI Releases the Ministral 3 Series of Models, Integrating Multimodal Understanding and Intelligent Execution Capabilities; From high-dynamic Dance to Everyday Behavior, the X-Dance Dataset Unlocks multi-dimensional Testing for Human Animation generation.

AI Paper Weekly Roundup | Attention Mechanism / NVIDIA VLA Model / TTS Model / Graph Neural Networks... A Comprehensive Overview of the Latest AI Developments

Unveiling AI Inference: OpenAI's Sparse Model Makes Neural Networks Transparent for the First Time; Calories Burnt Prediction: Injecting Precise Energy Data Into Fitness Models

Online Tutorial | Microsoft Open Sources VibeVoice, Enabling 90 Minutes of Natural Dialogue Between 4 Roles

Related News

TRELLIS.2: Employs O-Voxel Technology for Efficient Generation of Complex 3D Geometry and Materials; Patient Churn Prediction Dataset: Helps Identify Patients at Risk of attrition.

GPT-5 Leads Across the Board; OpenAI Releases FrontierScience, Using a Dual Approach of "inference + Research" to Test the Capabilities of large-scale models.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

LightOnOCR-2-1B: High-precision end-to-end OCR Based on RLVR Training; Google Streetview National Street View Images: An open-source Panoramic Image Library Based on world-class Geomapping technology.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Open Source, Best Value! Mistral AI Releases the Ministral 3 Series of Models, Integrating Multimodal Understanding and Intelligent Execution Capabilities; From high-dynamic Dance to Everyday Behavior, the X-Dance Dataset Unlocks multi-dimensional Testing for Human Animation generation.

AI Paper Weekly Roundup | Attention Mechanism / NVIDIA VLA Model / TTS Model / Graph Neural Networks... A Comprehensive Overview of the Latest AI Developments

Unveiling AI Inference: OpenAI's Sparse Model Makes Neural Networks Transparent for the First Time; Calories Burnt Prediction: Injecting Precise Energy Data Into Fitness Models

Online Tutorial | Microsoft Open Sources VibeVoice, Enabling 90 Minutes of Natural Dialogue Between 4 Roles

Related News

TRELLIS.2: Employs O-Voxel Technology for Efficient Generation of Complex 3D Geometry and Materials; Patient Churn Prediction Dataset: Helps Identify Patients at Risk of attrition.

GPT-5 Leads Across the Board; OpenAI Releases FrontierScience, Using a Dual Approach of "inference + Research" to Test the Capabilities of large-scale models.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

LightOnOCR-2-1B: High-precision end-to-end OCR Based on RLVR Training; Google Streetview National Street View Images: An open-source Panoramic Image Library Based on world-class Geomapping technology.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Open Source, Best Value! Mistral AI Releases the Ministral 3 Series of Models, Integrating Multimodal Understanding and Intelligent Execution Capabilities; From high-dynamic Dance to Everyday Behavior, the X-Dance Dataset Unlocks multi-dimensional Testing for Human Animation generation.

AI Paper Weekly Roundup | Attention Mechanism / NVIDIA VLA Model / TTS Model / Graph Neural Networks... A Comprehensive Overview of the Latest AI Developments

Unveiling AI Inference: OpenAI's Sparse Model Makes Neural Networks Transparent for the First Time; Calories Burnt Prediction: Injecting Precise Energy Data Into Fitness Models

Online Tutorial | Microsoft Open Sources VibeVoice, Enabling 90 Minutes of Natural Dialogue Between 4 Roles