Home News Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

Automatic Lyrics Transcription On Jam Alt 3

Metrics

Case-Sensitive Word Error Rate

Line break F-1

Punctuation F-1

Word Error Rate (WER)

Results

Performance results of various models on this benchmark

Model Name	Case-Sensitive Word Error Rate	Line break F-1	Punctuation F-1	Word Error Rate (WER)	Paper Title	Repository
Whisper v2 +lang	26.0	71.7	48.4	19.9	Lyrics Transcription for Humans: A Readability-Aware Benchmark
Whisper v2 +demucs	70.4	67.3	49.1	65.2	Lyrics Transcription for Humans: A Readability-Aware Benchmark
Whisper v3 +demucs	47.4	71.9	45.4	43.5	Lyrics Transcription for Humans: A Readability-Aware Benchmark
Whisper v3	44.6	71.1	47.3	40.7	Lyrics Transcription for Humans: A Readability-Aware Benchmark
Whisper v2 +demucs +lang	30.4	70.6	49.2	23.9	Lyrics Transcription for Humans: A Readability-Aware Benchmark
Whisper v2	-	69.9	38.7	45.4	Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark
OWSM v3.1 +lang	71.8	40.7	28.6	63.3	Lyrics Transcription for Humans: A Readability-Aware Benchmark
Whisper v3 +demucs +lang	44.9	70.5	46.9	40.8	Lyrics Transcription for Humans: A Readability-Aware Benchmark
Whisper v2	59.3	70.0	47.1	54.5	Lyrics Transcription for Humans: A Readability-Aware Benchmark
Whisper v3 +lang	40.4	71.1	47.4	35.9	Lyrics Transcription for Humans: A Readability-Aware Benchmark
Whisper v2 +demucs	-	67.5	30.2	65.2	Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark
AudioShake v1	-	81.2	48.5	24.4	Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark
Whisper v3 +demucs	-	72.0	34.0	43.5	Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark
Whisper v3	-	71.2	41.2	40.7	Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark
OWSM v3.1 +demucs +lang	62.0	41.4	24.7	51.8	Lyrics Transcription for Humans: A Readability-Aware Benchmark
AudioShake v3	17.5	83.7	57.1	12.6	Lyrics Transcription for Humans: A Readability-Aware Benchmark

0 of 16 row(s) selected.