HyperAIHyperAI

Command Palette

Search for a command to run...

From WeChat's Grayscale Test to the Failures of Google/ByteDance/Xiaohongshu, Can AI Podcasts Seize the New Blue Ocean of AIGC?

Featured Image

Friends who usually read public accounts may have noticed that WeChat has launched a new feature some time ago - "News". In this section, users can not only read daily information, but also listen to a news podcast presented by two male and female hosts in the form of a conversation, covering hot news, international news and other aspects. What's more interesting is thatThe words "generated by AI" are clearly marked below the podcast title, which shows that WeChat is conducting a grayscale test of AI podcasts.This move echoes the news that Tencent Hunyuan officially launched the AI podcast function on August 5, and the "AI News + AI Podcast" industry solution publicly displayed by Tencent Cloud Smart Media at the Tencent Global Digital Ecosystem Conference from September 16 to 17.

In recent years, AI-powered graphics (such as Midjourney and Stable Diffusion) and AI-powered videos (such as Veo3 and Wan2.2) have become global sensations. Now, AI podcasting is taking over as another hot trend. With breakthroughs in large language models for generating conversational text and the maturity of high-fidelity speech synthesis technology, podcasting, a format that relies heavily on creators, is also being impacted by AI.

The secret of AI podcasts achieving a "living feeling"

If you look for recommendations for high-quality podcasts on social media, you will definitely find "live-like podcast" among the related search terms.The so-called "liveliness" actually refers to the natural emotional expression of the creators through podcast conversations.It's these pauses and hesitations between words, or the sudden bursts of laughter and arguments, that make listeners feel truly present. But when people think of AI podcasts, they might think of everyday smart voice assistants like mobile phone voice assistants, in-car voice assistants, and smart home assistants. Their generally mechanical nature inevitably creates a preconceived notion. So the question arises: Can AI podcasts truly "speak like a real person," making people forget they're talking to a machine? Before answering this question, let's listen to a short clip from the podcast.

(Audio content generated by AI)

It is not difficult to find that the two hosts' dialogue, which is a tacit response of "praising and teasing", is actually quite "human". In fact, this material is an AI podcast generated by Doubao with one click. What's more, the effect of achieving "liveliness" is no longer an isolated case in the industry. The change from mechanical electronic sound to human-like speech,All of them benefit from the development of the same main technology line - modern neural network text-to-speech (TTS) technology.

Different from the traditional mechanized synthesis and splicing TTS technology,Modern TTS uses deep learning models to better capture multi-dimensional features of speech, such as intonation, timbre, speaking speed, emotion, and style, thereby generating more natural, fluent, and expressive speech.On this basis, the addition of technologies such as adversarial training, speech modeling based on large language models, and multimodal conditional control has made the speech generated by the model increasingly difficult to distinguish from human speech.

For example, Microsoft released a new TTS model, VibeVoice-1.5B, in August this year. Through innovative continuous speech tokenization technology and the next-generation token segmentation diffusion framework, combined with a large language model, it achieves the ability to efficiently process long sequence audio.

Online tutorial link:https://go.hyper.ai/6ruF7

Mianbi Intelligence and Tsinghua University Shenzhen International Graduate School jointly developed a 0.5B parameter speech generation model, VoxCPM. This model utilizes an end-to-end diffuse autoregressive architecture to generate continuous speech representations directly from text, breaking through the limitations of traditional discrete word segmentation. The model achieves impressive levels of naturalness, timbre similarity, and rhythmic expressiveness in speech synthesis.

Online tutorial link:https://go.hyper.ai/frmze

IndexTTS-2, brought by the Bilibili Voice team, proposes a novel, universal and autoregressive model-friendly speech duration control method. It is the first autoregressive TTS model that supports precise duration control.

Online tutorial link:https://go.hyper.ai/z7Jdt

The HyperAI official website (hyper.ai) has launched a number of one-click deployment tutorials for high-quality open source TTS models in the "Tutorials" section. You are welcome to visit and experience them.

The current AI podcast ecosystem: two types of players and multiple tracks

At the application level, the aforementioned technologies have gradually entered the public eye. Currently, AI podcast products on the market can be divided into two camps based on their backgrounds:

On the one hand, the participation of large players has undoubtedly added fuel to the AI podcast track and quickly increased the attention in the field.Among them, the earliest product to come out of the circle is Google's NoteBookLM, which is known for its highly summarized audio overview.Designed to help users quickly digest information, its powerful audio capabilities have also become an efficient tool for AI podcasts. After recent optimizations, it now supports over 50 languages, including Chinese, resolving the previous issue of only being able to use English.Doubao, launched by ByteDance, relies on the large model capabilities of Volcano Engine to generate podcast content with one click.End-to-end language dialogue can be understood as "listening, understanding, and answering at the same time." Its naturalness and texture are among the best in Chinese AI podcasts. In addition,The Xiaohongshu audio team also recently introduced the dialogue generation model FireRedTTS-2.The related paper was published on arXiv under the title "FireRedTTS-2: Towards Long Conversational Speech Generation for Podcast and Chatbot".

On the other hand, startup teams demonstrate diverse innovative capabilities.Representative products include Laifu Radio, which claims to be "an AI radio station exclusively for everyone," and whose podcast programs are all generated by AI; ChatPods, launched by MiaoYa Camera founder Zhang Yueguang and his team, focuses on personal "AI podcast agents" that use AI to generate voice extracts and make personalized podcast recommendations; and "Huxe", brought by former NotebookLM team members, is also committed to creating convenient and personalized content through AI. The DeepCasts function can instantly generate AI podcasts exclusively for users, bringing customized knowledge acquisition anytime, anywhere.

Conclusion

In addition to the above-mentioned innovations in podcast content production and interactive formats, AI's empowerment of the podcast field has also penetrated into more links of the creative chain.

At the "Made on YouTube" event held on September 16, YouTube CEO Neal Mohan announced a series of new AI tools.One of the more interesting ones is the audio-video AI generation tool designed specifically for podcast creators, which helps podcast creators easily produce podcast video slices.


Screenshot of the Made on YouTube event video

The launch of this tool is actually a microcosm of the current deep penetration of AI technology into the podcast field.From the creator's perspective,The emergence of AI podcasts has significantly lowered the threshold for content production. It can not only optimize scripts, but also assist in editing, recommendation, and even distribution, allowing individual creators and even small teams to quickly produce high-quality programs.From the user's perspective,AI brings more intelligent content recommendations, allowing listeners to obtain podcast content that suits them more efficiently, and even achieve a more immersive listening experience with the support of voice assistants.

Overall,AI podcasts are flourishing, and the reason lies in the potential commercial value behind the podcast field.According to the "2024 Podcast Industry Report," 45.91% of surveyed users have purchased paid podcasts in the past year, and 63.61% are open to podcast advertising. With changing lifestyles and consumption habits, the podcasting space may no longer be the "small and beautiful" niche it once was. Its potential is waiting to be tapped, and the monetization challenges facing the traditional podcast industry may find new solutions with the help of AI. Whether it's increased productivity or a more satisfying user experience, the future of the podcast industry is full of promise.

Reference Links:
1.https://mp.weixin.qq.com/s/WH60YKbhAEf51si4mlZoNQ
2.https://asmp-eurasipjournals.springeropen.com/articles/10.1186/s13636-024-00329-7
3.https://mp.weixin.qq.com/s/XFK59UJu9appRpHmtsIjeg
4.https://techcrunch.com/2025/09/23/former-notebooklm-devs-new-app-huxe-taps-audio-to-help-you-with-news-and-research/
5.https://www.huxe.com/blog
6.http://www.news.cn/fortune/20250407/669ffc4208b24ce895c9b560b05ff6a0/c.html