OuteTTS: Speech Generation Engine
1. Tutorial Introduction

OuteTTS is an open source text-to-speech (TTS) synthesis project released by the Oute AI team in early November 2024. Its core innovation is to use a pure language modeling approach to generate high-quality speech without relying on complex adapters or external modules in traditional TTS systems. The main features include:
- Text-to-speech synthesis: Input text to generate natural and fluent speech output, supporting customizable speech speed and intonation.
- Voice cloning: Users can provide reference audio as short as a few seconds and corresponding text to create personalized voices, which is suitable for customized voice assistants, audiobooks and other scenarios.
The model used in this tutorial is the Llama-OuteTTS-1.0-1B model released by Oute AI in March 2025. The parameters have been increased from 350 million to 1 billion, significantly enhancing the voice expressiveness and stability. It also supports localized synthesis in 20 languages, and the cross-language cloning capability has been further optimized.
The computing resources of this tutorial use a single RTX 4090 card. This tutorial mainly provides two usage examples of Default Speaker and Voice Cloning. This tutorial only supports English.
2. Effect display

3. Operation steps
1. Start the container

2. Usage steps
If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.
When using the Safari browser, the audio may not be played directly and needs to be downloaded before playing.
Specific parameters:
- Text: Enter the text to be generated.
- Temperature: Scaling factor that controls the randomness of the output.
- Repetition Penalty: Penalty coefficient for suppressing repeated generation.
- Top-k: Limit the number of candidate words generated at each step.
- Top-p: Dynamic candidate word selection (kernel sampling).
- Minimum Probability (min-p): Sets the minimum probability threshold for candidate words.
1. Default Speaker

2. Voice Cloning

4. Discussion
🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓
