Speech Recognition
La technologie de la parole concerne la capacité des systèmes informatiques à traiter le langage parlé humain, visant à réaliser la reconnaissance vocale, la synthèse et la compréhension de la parole. Son objectif est de créer des systèmes intelligents capables d'interagir efficacement, améliorant ainsi l'expérience utilisateur. Elle est largement utilisée dans les assistants virtuels, les systèmes de service client, la traduction vocale et d'autres domaines, contribuant considérablement à rendre l'interaction homme-machine plus naturelle et pratique.
AISHELL-1
FireRedASR-AED
AISHELL-2
AISHELL-2 Android
AISHELL-2 Mic
AISHELL-2 Test Android
Qwen-Audio
AISHELL-2 Test IOS
AISHELL-2 Test Mic
AMI IMH
AMI SDM1
ATCOSIM corpus (Air Traffic Control Communications)
ATCOSIM dataset (Air Traffic Control Communications)
CALLHOME En
WavLM Large & EEND-vector clustering
CALLHOME Spanish Speech
CAS-VSR-S101
CHiME-6 dev_gss12
CHiME-6 eval
Common Voice
Common Voice 7.0 Abkhaz
Common Voice 7.0 Arabic
Common Voice 7.0 Bashkir
Common Voice 7.0 German
Common Voice 7.0 Hindi
Common Voice 7.0 Odia
Common Voice 7.0 Portuguese
Common Voice 7.0 Votic
Common Voice 8.0 Assamese
Common Voice 8.0 Basaa
Common Voice 8.0 Breton
Common Voice 8.0 Bulgarian
Common Voice 8.0 Central Kurdish
Common Voice 8.0 Dutch
Common Voice 8.0 Erzya
Common Voice 8.0 French
Common Voice 8.0 Galician
Common Voice 8.0 German
Common Voice 8.0 Guarani
Common Voice 8.0 Hausa
Common Voice 8.0 Hindi
Common Voice 8.0 Hungarian
Common Voice 8.0 Japanese
Common Voice 8.0 Kabyle
Common Voice 8.0 Kazakh
Common Voice 8.0 Kurmanji Kurdish
Common Voice 8.0 Maltese
Common Voice 8.0 Marathi
Common Voice 8.0 Odia
Common Voice 8.0 Portuguese
Common Voice 8.0 Punjabi
Common Voice 8.0 Romansh Sursilvan
Common Voice 8.0 Romansh Vallader
Common Voice 8.0 Russian
Common Voice 8.0 Santali (Ol Chiki)
Common Voice 8.0 Serbian
Common Voice 8.0 Slovenian
Common Voice 8.0 Sorbian, Upper
Common Voice 8.0 Swahili
Common Voice 8.0 Tatar
Common Voice 8.0 Uzbek
Common Voice 8.0 Votic
Common Voice Arabic
Common Voice Breton
Common Voice Catalan
Common Voice Chinese (China)
Common Voice Czech
Common Voice Dutch
Common Voice English
Whisper (Large v2)
Common Voice French
ConformerCTC-L (5-gram)
Common Voice Frisian
Common Voice Georgian
Common Voice German
wav2vec 2.0 XLS-R 1B + TEVR (5-gram)
Common Voice Hindi
Common Voice Indonesian
Common Voice Italian
Whisper (Large v2)
Common Voice Japanese
Common Voice Lithuanian
Common Voice Maltese
Common Voice Odia
Common Voice Persian
Common Voice Polish
Common Voice Portuguese
XLSR53 Wav2Vec2 Portuguese by Orlem Santos
Common Voice Russian
Whisper (Large v2)
Common Voice Spanish
ConformerCTC-L (4-gram)
Common Voice Swedish
Common Voice Tamil
Common Voice Turkish
Common Voice vi
khanhld/chunkformer-large-vie
Common Voice Vietnamese
Common Voice Welsh
CORAA
EasyCom
Europarl-ASR EN Guest-test
Europarl-ASR EN MEP-test
facebook/multilingual_librispeech german
TDT 0-4
FLEURS
fon
Fongbe audio
Triphone (39 features) + LDA and MLLT + SGMM
German ASR Data-Mix
GigaSpeech
Conformer/Transformer-AED
GigaSpeech DEV
SAMBA ASR
GigaSpeech TEST
Zipformer+pruned transducer w/ CR-CTC
(no external language model)
Google Speech Commands - Musan
Hub5'00 CallHome
Espresso
Hub5'00 FISHER-SWBD
CTC-CRF
Hub5'00 SwitchBoard
Espresso
Kazakh Speech Corpus v1.1
Libri-Light test-clean
wav2vec 2.0 Large-10h-LV-60k
Libri-Light test-other
wav2vec 2.0 Large-10h-LV-60k
LibriCSS
TS-SEP
LibriSpeech 100h test-clean
LibriSpeech 100h test-other
Branchformer + GFSA
LibriSpeech test-clean
HuBERT with Libri-Light
LibriSpeech test-other
wav2vec 2.0 with Libri-Light
LibriSpeech train-clean-100 test-clean
wav2vec_wav2letter
LibriSpeech train-clean-100 test-other
wav2vec_wav2letter
LRS2
RAVEn Large
LRS3-TED
Whisper
MediaSpeech
Quartznet
MLS
Mozilla Common Voice 15.0 Persian
Mozilla Common Voice 16.1
Mozilla Common Voice 9.0
Podlodka.io
projecte-aina/parlament_parla ca
Reazonspeech
Robust Speech Event - Catalan Dev Data
Robust Speech Event - Dev Data
Russian LibriSpeech
SLUE
W2V2-B-VP100K
Speech Commands
Centaurus
SPGI Speech
SPGISpeech
swb_hub_500 WER fullSWBCH
Switchboard (300hr)
Switchboard CallHome
Switchboard + Hub500
Switchboard SWBD
TED-LIUM
Whisper-LLaMa-7b
Tedlium
tedlium-v3
TIMIT
wav2vec 2.0
TUDA
Conformer-Transducer (no LM)
UWB-ATCC dataset (Air Traffic Control Communications)
VietMed
VIVOS
khanhld/chunkformer-large-vie
WenetSpeech
Paraformer-large
WSJ dev93
CTC-CRF ST-NAS
WSJ eval92
WSJ eval93
Deep Speech 2