HyperAI

Audio-Visual Speech Recognition is the task of transcribing paired audio and video streams into text, aiming to improve the accuracy and robustness of speech recognition by combining visual and auditory information. This technology has significant application value in speech transcription in noisy environments, lip-reading assistance, and multimodal human-computer interaction.

LRS3-TED

CTC/Attention