HyperAI
Startseite
Neuigkeiten
Neueste Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Deutsch
HyperAI
Toggle sidebar
Seite durchsuchen…
⌘
K
Startseite
SOTA
Speech Recognition
Speech Recognition On Wsj Eval92
Speech Recognition On Wsj Eval92
Metriken
Word Error Rate (WER)
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
Word Error Rate (WER)
Paper Title
Repository
End-to-end LF-MMI
3.0
End-to-end speech recognition using lattice-free MMI
-
Espresso
3.4
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
ConformerXXL-P
1.3
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
-
CNN over RAW speech (wav)
5.6
-
-
tdnn + chain
2.32
Purely sequence-trained neural networks for ASR based on lattice-free MMI
-
CTC-CRF ST-NAS
2.77
Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients
-
Jasper 10x3
6.9
Jasper: An End-to-End Convolutional Neural Acoustic Model
TC-DNN-BLSTM-DNN
3.5
Deep Recurrent Neural Networks for Acoustic Modelling
-
Speechstew 100M
1.3
SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
-
test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*
3.6
-
-
Convolutional Speech Recognition
3.5
Fully Convolutional Speech Recognition
-
CTC-CRF VGG-BLSTM
3.2
CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency
CTC-CRF 4gram-LM
3.79
CRF-based Single-stage Acoustic Modeling with CTC Topology
Transformer with Relaxed Attention
3.19
Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition
Deep Speech 2
3.60
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Task activating prompting generative correction
2.11
Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting
-
RobustGER
2.2
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
0 of 17 row(s) selected.
Previous
Next