HyperAI超神経
Back to Headlines

Resemble AI Unveils Chatterbox: State-of-the-Art Open Source Text-to-Speech Model with Emotion Control

3日前

Resemble AI is proud to introduce Chatterbox, its first production-grade open-source Text-to-Speech (TTS) model. Released under the permissive MIT license, Chatterbox has been rigorously benchmarked against leading commercial TTS systems, such as ElevenLabs, and often comes out as the preferred choice in side-by-side evaluations. Chatterbox is designed to breathe life into a variety of content, making it ideal for creating memes, videos, games, and AI agents. One of its standout features is emotion exaggeration control, a capability that sets it apart from other open-source models and allows users to add depth and nuance to their synthesized speech. To get started, you can try Chatterbox immediately via the Hugging Face Gradio app. For those who need more scalable solutions or higher accuracy, Resemble AI offers a competitively priced TTS service that ensures ultra-low latency of less than 200 milliseconds, perfect for real-time applications in conversational agents or interactive media. Key Details License: MIT Benchmark: Consistently preferred over leading closed-source TTS systems Emotion Control: Supports emotion exaggeration Platform: Hugging Face Gradio app Installation For a seamless setup, we recommend installing Chatterbox using the following pip command: sh pip install git+https://github.com/resemble-ai/chatterbox Alternatively, if you prefer a more customized installation, you can clone the repository and set it up from source. We primarily developed and tested Chatterbox on Python 3.11 under Debian 11 OS, and the specific versions of dependencies are detailed in the pyproject.toml file to ensure consistency. This mode allows you to modify the code or dependencies as needed. Usage To see how Chatterbox works in action, check out the provided example scripts: example_tts.py and example_vc.py. Supported Language Currently, Chatterbox supports English speech synthesis. Built-in Perth Watermarking for Responsible AI Chatterbox includes Resemble AI’s Perth (Perceptual Threshold) Watermarker, which adds an imperceptible neural watermark to every generated audio file. This watermark is designed to survive common manipulations, including MP3 compression and audio editing, while maintaining nearly 100% detection accuracy. This feature underscores our commitment to ethical and responsible AI usage. Watermark Extraction To verify the presence of the watermark, you can use the following script provided by Resemble AI: ```python Example script for extracting the watermark import chatterbox audio_file = "path/to/your/audio/file.mp3" watermark = chatterbox.extract-watermark(audio_file) print(watermark) ``` Community Engagement We invite you to join our official Discord server to connect with the community, share ideas, and collaborate on exciting projects. Together, we can push the boundaries of what’s possible with TTS technology. Disclaimer Please use Chatterbox responsibly. The prompts used to train the model come from freely available data on the internet. We remind users to avoid using the model for malicious purposes. With Chatterbox, Resemble AI aims to democratize access to cutting-edge TTS technology, empowering developers and creators to innovate and bring their projects to life. Whether you’re a hobbyist or a professional, we believe you’ll find Chatterbox to be a valuable tool in your toolkit. Try it out today and experience the future of speech synthesis!

Related Links