HyperAI초신경

홈 뉴스 연구 논문 튜토리얼 데이터셋 백과사전 SOTA LLM 모델 GPU 랭킹 컨퍼런스

한국어

HyperAI초신경

Text To Speech Synthesis On Ljspeech

평가 지표

Audio Quality MOS

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름	Audio Quality MOS	Paper Title	Repository
FastSpeech (Mel + WaveGlow)	3.84	FastSpeech: Fast, Robust and Controllable Text to Speech
FastDiff-TTS	4.03	FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
FastSpeech 2 + HiFiGAN	4.34	NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Grad-TTS + HiFiGAN (1000 steps)	4.37	Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Flowtron	-	Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Transformer TTS (Mel + WaveGlow)	3.88	Neural Speech Synthesis with Transformer Network
VITS	4.43	NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Matcha-TTS	-	Matcha-TTS: A fast TTS architecture with conditional flow matching
Tacotron 2	-	Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
FastSpeech 2 + HiFiGAN	4.32	FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
temp	1.25	-	-
FastDiff (4 steps)	4.28	FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
OverFlow	3.37	OverFlow: Putting flows on top of neural transducers for better TTS
NaturalSpeech	4.56	NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Merlin	2.4	FastSpeech: Fast, Robust and Controllable Text to Speech
Glow-TTS + HiFiGAN	4.34	Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

0 of 16 row(s) selected.