HyperAIHyperAI

NeuTTS-Air: A Lightweight and Efficient Voice Cloning Model

1. Tutorial Introduction

Stars

NeuTTS-Air is an end-to-end speech synthesis model (TTS) released by Neuphonic in October 2025. Based on the 0.5B Qwen LLM backbone and NeuCodec audio codec, it demonstrates few-shot learning capabilities in on-device deployment and instant voice cloning. System evaluation shows that NeuTTS Air has reached the SOTA level among open source models, especially in ultra-realistic synthesis and real-time inference benchmarks. It can also generalize to new scenarios such as embedded agents and style transfer, support 3-second audio cloning, and generate natural conversation content. Post-training introduces GGML/ONNX support and watermarking mechanism, leading the open source field in on-device TTS and power optimization evaluation, and some scenarios are comparable to closed-source models.

This tutorial uses a single RTX 5090 card as the resource, and the model only supports English.

2. Project Examples

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Once you enter the webpage, you can use the model

If "Bad Gateway" is displayed, it means that the code is executing in the background. Please wait about 2-3 minutes and refresh the page.

When using the Safari browser, the audio may not be played directly and needs to be downloaded before playing.

How to use

NeuTTS-Air: A Lightweight and Efficient Voice Cloning Model | Tutorials | HyperAI