6 months ago

Anmol Gulati James Qin Chung-Cheng Chiu Niki Parmar Yu Zhang Jiahui Yu Wei Han Shibo Wang Zhengdong Zhang Yonghui Wu

Abstract

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. In this work, we achieve the best of both worlds by studying how to combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. To this regard, we propose the convolution-augmented transformer for speech recognition, named Conformer. Conformer significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies. On the widely used LibriSpeech benchmark, our model achieves WER of 2.1%/4.3% without using a language model and 1.9%/3.9% with an external language model on test/testother. We also observe competitive performance of 2.7%/6.3% with a small model of only 10M parameters.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Transformer

Convolutional Neural Network

Audio and Speech Processing

Method/Architecture

Audio

Task/Problem

Anmol Gulati James Qin Chung-Cheng Chiu Niki Parmar Yu Zhang Jiahui Yu Wei Han Shibo Wang Zhengdong Zhang Yonghui Wu

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Transformer

Convolutional Neural Network

Audio and Speech Processing

Method/Architecture

Audio

Task/Problem

Anmol Gulati James Qin Chung-Cheng Chiu Niki Parmar Yu Zhang Jiahui Yu Wei Han Shibo Wang Zhengdong Zhang Yonghui Wu

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Conformer: Convolution-augmented Transformer for Speech Recognition

Anmol Gulati James Qin Chung-Cheng Chiu Niki Parmar Yu Zhang Jiahui Yu Wei Han Shibo Wang Zhengdong Zhang Yonghui Wu1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Conformer: Convolution-augmented Transformer for Speech Recognition

Anmol Gulati James Qin Chung-Cheng Chiu Niki Parmar Yu Zhang Jiahui Yu Wei Han Shibo Wang Zhengdong Zhang Yonghui Wu1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Conformer: Convolution-augmented Transformer for Speech Recognition

Anmol Gulati James Qin Chung-Cheng Chiu Niki Parmar Yu Zhang Jiahui Yu Wei Han Shibo Wang Zhengdong Zhang Yonghui Wu1 more

Abstract

Build AI with AI

HyperAI Newsletters

Anmol Gulati James Qin Chung-Cheng Chiu Niki Parmar Yu Zhang Jiahui Yu Wei Han Shibo Wang Zhengdong Zhang Yonghui Wu

Anmol Gulati James Qin Chung-Cheng Chiu Niki Parmar Yu Zhang Jiahui Yu Wei Han Shibo Wang Zhengdong Zhang Yonghui Wu

Anmol Gulati James Qin Chung-Cheng Chiu Niki Parmar Yu Zhang Jiahui Yu Wei Han Shibo Wang Zhengdong Zhang Yonghui Wu