Hume Launches Hyperrealistic Voice Cloning Feature: How to Create and Chat with Your AI Doppelgänger
On Thursday, AI startup Hume launched a new "hyperrealistic voice cloning" feature for its Empathic Voice Interface (EVI) model, specifically EVI 3, which was introduced last month. The feature allows users to upload a short audio recording of their voice—ideally between 30 and 90 seconds—to create an AI-generated replica. This virtual doppelgänger can then engage in verbal conversations, much like a real person. I decided to test the feature by uploading a recording of my voice and engaging in a chat with the AI version of myself. Initially, I was hopeful for an Uncanny Valley experience—where the replica feels almost real but subtly off-putting. However, the result was more akin to an exaggerated, cartoonish version of my voice. The imitation captured some realistic elements, such as my pauses and vocal fry, but it fell short in replicating my personality. Instead, it adopted an overly cheerful and polite tone. When I asked it to speak in an Australian accent, it reluctantly tried but quickly reverted to my normal voice. This behavior suggested a limitation in the model's adaptability. In another trial, where I recorded myself talking about Led Zeppelin, the AI clone consistently brought the conversation back to music, even when I tried to steer it toward unrelated topics like dark matter. This indicated that the model has some tendencies to anchor discussions around the initial input, similar to another AI model's Obsession Bridge experiment by Anthropic. Hume claims that EVI 3 can capture aspects of a speaker's personality and emphasizes words, timing, and emotional cues more accurately than previous models. CEO and chief scientist Alan Cowen explained that the model's realism comes from its extensive training on vast amounts of text and speech data, which allows it to mimic human vocal patterns effectively. However, the notion of "understanding" in AI remains contentious. Many experts argue that models like EVI 3 detect and replicate patterns rather than truly comprehend language. Despite this, the technical advancements in voice cloning are undeniable. Hume and similar companies predict that these models will benefit industries like entertainment and marketing, but they also raise concerns about potential misuse for deception. An example of such misuse occurred recently when an individual used AI to imitate U.S. Secretary of State Marco Rubio's voice in an attempt to deceive government officials. Linguist Emily M. Bender expressed skepticism about the need for such technology, questioning its utility beyond possible malicious uses. The rapid advancement of generative AI is both impressive and concerning. In less than three years, we've seen tools evolve from basic text generation to sophisticated voice and video simulations. Hume's current voice clone is a rough approximation of a human voice, but future iterations may achieve much higher levels of realism. This progress could lead to practical applications, such as AI agents representing individuals in virtual meetings. Conversely, it could also facilitate sophisticated scams. The normalization of such technology highlights the accelerating pace of innovation, where what once seemed revolutionary quickly becomes routine. Hume's EVI 3 is available to try for free on the company's website. The platform collects and anonymizes user data by default to improve its models, but users can opt for "Zero data retention" in their profile settings. This transparency is crucial as concerns over privacy and data usage grow alongside the capabilities of AI tools. Industry insiders praise Hume's advancements while warning of the ethical implications. As Alan Cowen noted, the realism of voice models can be surprisingly human, but experts like Emily Bender caution against the potential for harmful applications. The balance between innovative benefits and ethical risks remains a critical consideration as AI continues to evolve. Hume’s commitment to enhancing the realism and emotional depth of AI voices positions it at the forefront of a rapidly growing field. However, the responsibility to mitigate abuse and protect user privacy is equally paramount. As AI tools become more sophisticated, it is essential for both developers and users to remain vigilant.