HyperAI超神経

Lipreading On Lip Reading In The Wild

評価指標

Top-1 Accuracy

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

モデル名
Top-1 Accuracy
Paper TitleRepository
3D Conv + ResNet-18 + Bi-GRU(Face Cutout)85.02Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
3D Conv + ResNet-18 + MS-TCN + Multi-Head Visual-Audio Memory88.5Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
3D Conv + ResNet-18 + DC-TCN + KD (Ensemble & Word Boundary)94.1Training Strategies for Improved Lip-reading
DFTN84.13Deformation Flow Based Two-Stream Network for Lip Reading
SyncVSR (Word Boundary)95.0SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
AVCRFormer89.57Audio-Visual Speech Recognition based on Regulated Transformer and Spatio-Temporal Fusion Strategy for Driver Assistive Systems
Multi-grained + Bi-ConvLSTM83.34Multi-Grained Spatio-temporal Modeling for Lip-reading-
PCPG83.5Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading-
SpotFast + Transformer + Product-Key memory84.4SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading
3D Conv + ResNet-18 + MS-TCN85.30Lipreading using Temporal Convolutional Networks
Vosk + MediaPipe + LS + MixUp + SA + 3DResNet-18 + BiLSTM + Cosine WR88.7Visual Speech Recognition in a Driver Assistance System-
3D Conv + EfficientNetV2 + Transformer + TCN89.52Accurate and Resource-Efficient Lipreading with Efficientnetv2 and Transformers-
3D Conv + ResNet-34 + Bi-LSTM83.00Combining Residual Networks with LSTMs for Lipreading
3D-ResNet + Bi-GRU + MixUp + Label Smoothing + Cosine LR85.5Learn an Effective Lip Reading Model without Pains
MoCo + Wav2Vec by SJTU LUMIA85.0Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
3D Conv + ResNet-18 + Bi-GRU + Visual-Audio Memory85.4Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
3D Conv + ResNet-18 + MS-TCN + KD (Ensemble)88.5Towards Practical Lipreading with Distilled and Efficient Models
3D Conv + ResNet-18 + Bi-GRU84.41Mutual Information Maximization for Effective Lip Reading
3D Conv + ResNet-34 + Bi-GRU83.39End-to-end Audiovisual Speech Recognition
3D-ResNet + Bi-GRU + MixUp + Label Smoothing + Cosine LR (Word Boundary)88.4Learn an Effective Lip Reading Model without Pains
0 of 22 row(s) selected.