VoxCPM: TTS Technology Without Word Segmentation
1. Tutorial Introduction
VoxCPM is a 0.5B parameter speech generation model jointly developed by Mianbi Intelligence and Tsinghua University Shenzhen International Graduate School in September 2025. It achieves industry-leading levels of naturalness, timbre similarity, and prosodic expressiveness in speech synthesis. VoxCPM utilizes an end-to-end diffuse autoregressive architecture to generate continuous speech representations directly from text, transcending the limitations of traditional discrete word segmentation. Through hierarchical language modeling and finite-state quantization constraints, it achieves implicit decoupling of semantics and acoustics, significantly improving the expressiveness and generation stability of speech. VoxCPM supports zero-shot voice cloning, requiring only a single reference audio clip to accurately replicate the speaker's timbre, accent, emotional intonation, and other characteristics, generating highly realistic speech.
The computing resources used in this tutorial are a single RTX 4090 card.
2. Effect display

3. Operation steps
1. Start the container
If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.

2. Usage steps

Specific parameters:
- CFG Value: The higher the value, the higher the compliance to the prompt, and the lower the value, the higher the creativity.
- Inference Timesteps: The number of inference timesteps to generate (higher values may improve quality but at the expense of slower speed).
- Prompt Speech Enhancement: Uses the ZipEnhancer model to denoise the prompt audio.
- Text Normalization: Use the wetext library to normalize the input text.
4. Discussion
🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

Citation Information
The citation information for this project is as follows:
@misc{voxcpm2025,
author = {{Yixuan Zhou, Guoyang Zeng, Xin Liu, Xiang Li, Renjie Yu, Ziyang Wang, Runchuan Ye, Weiyue Sun, Jiancheng Gui, Kehan Li, Zhiyong Wu, Zhiyuan Liu}},
title = {{VoxCPM}},
year = {2025},
publish = {\url{https://github.com/OpenBMB/VoxCPM}},
note = {GitHub repository}
}