NVIDIA Open-source Speech Recognition Model, ParaKeet-tdt-0.6b-v2 Can Transcribe 1 Hour of Audio in Just 1 Second, Accurately Identifying Pichai Ge's Speech

Whether it is understanding user intent in real time in intelligent customer service, or recognizing audio with multiple speeds and accents in scenarios such as meeting records, interview compilation, and subtitle generation, the continuously upgraded usage needs have put forward more stringent requirements for speech recognition technology, such as recognition speed, usage cost, accuracy and stability in noisy environments, etc.
Faced with the above challenges,NVIDIA recently open-sourced the speech recognition model ParaKeet-tdt-0.6b-v2.Based on the FastConformer architecture and NVIDIA's self-developed TDT (TransducerDecoderTransformer) technology, it achieves extreme inference efficiency.It only takes 1 second to process 60 minutes of audio content.Surpassing all mainstream closed-source models. Moreover, this model focuses on high-precision, low-latency English speech transcription tasks, which is suitable for real-time English speech-to-text scenarios, making cross-language communication easy and making meeting records smoother.
at present,The "ParaKeet-tdt-0.6b-v2 Speech Recognition" demo has been launched in the "Tutorial" section of HyperAI's official website.Click the link below to experience the one-click deployment tutorial
Tutorial Link:
Demo Run
1. After entering the hyper.ai homepage, select the "Tutorial" page, select "ParaKeet-tdt-0.6b-v2 Speech Recognition", and click "Run this tutorial online".


2. After the page jumps, click "Clone" in the upper right corner to clone the tutorial into your own container.

3. Select "NVIDIA GeForce RTX 4090" and "PyTorch" images. The OpenBayes platform provides 4 billing methods. You can choose "Pay as you go" or "Pay per day/week/month" according to your needs. Click "Continue". New users can register using the invitation link below to get 4 hours of RTX 4090 + 5 hours of CPU free time!
HyperAI exclusive invitation link (copy and open in browser):
https://openbayes.com/console/signup?r=Ada0322_NR0n


4. Wait for resources to be allocated. The first clone will take about 2 minutes. When the status changes to "Running", click the jump arrow next to "API Address" to jump to the Demo page. Please note that users must complete real-name authentication before using the API address access function.


Effect Demonstration
Upload the audio file in "Upload Audio File" and then click "Transcribe Uploaded File" to recognize it. Here, I uploaded an audio clip of a Google I/O keynote speech, and the model recognized it quickly and accurately.

The content of speech recognition is as follows:
Hello everyone, good morning.
Welcome to Google.io.
I learned that today is the start of Gemini season.
Not really sure what the big deal is.
Every day is Gemini season here at Google.
A couple of weeks ago, Gemini completed Pokemon Blue.
In addition, ParaKeet-tdt-0.6b-v2 also supports voice input. Click "Microphone", then click "Record", and after recording, click "Transcribe Uploaded File" for recognition.

The above is the practical tutorial recommended by HyperAI this time. Everyone is welcome to experience it!
Tutorial Link: