HyperAI

The task of Speaker-Specific Lip to Speech Synthesis aims to accurately infer the speech style and content of a particular individual or a very small group of individuals by training on their lip movement data. This technology integrates the latest advancements in computer vision and speech synthesis, enabling highly personalized lip-to-speech conversion. It has significant application value, such as improving video call quality, assisting communication for people with hearing impairments, and enhancing virtual reality experiences.

GRID corpus (mixed-speech)

Visual Voice Memory

TCD-TIMIT corpus (mixed-speech)