HyperAIHyperAI

Command Palette

Search for a command to run...

Granary European Speech Recognition and Translation Dataset

Date

3 months ago

Size

50.49 GB

Organization

NVIDIA

Paper URL

2505.13404v2

Granary is a large-scale multilingual speech dataset released by NVIDIA's multi-site research team in 2025. The related paper results are "Granary: Speech Recognition and Translation Dataset in 25 European Languages", which aims to provide high-quality training and evaluation materials for multilingual ASR/AST models.

This dataset contains approximately 1 million hours of high-quality pseudo-labeled ASR speech data, covering 25 European languages (including 23 EU languages, as well as Ukrainian and Russian). The data is sourced from publicly available speech corpora and processed through a unified pseudo-labeling and quality filtering process.

Languages include:

Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Croatian, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovenian, Swedish, Ukrainian and Russian.

Granary.torrent
Seeding 1Downloading 0Completed 12Total Downloads 36
  • Granary/
    • README.md
      1.66 KB
    • README.txt
      3.31 KB
      • data/
        • Granary.zip
          50.49 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp