HyperAI

Whisper Web Online Speech Recognition Tool

Introduction

Whisper is a speech-to-text model that OpenAI opened source in 2023. Its generation effect has been widely praised. This tutorial is based on the open source project Whisper Web on GitHub and runs Whisper directly in the browser.

Whisper performs speech recognition based on ML and can be accelerated by WebGPU. It supports online/local audio file upload and instant recording in more than 100 languages. The recognized text can be exported in TXT and JSON file formats and can be directly translated into English.

Effect display

Running method (it takes about 10 seconds to initialize after starting the container, and then perform the following operations)

1. After cloning and starting the container, copy the API to your browser

2. Get audio files by online/local upload or instant recording

3. Select the model according to your needs

4. After completing the model selection, directly generate the results