HyperAI

Abstract

We present the WaveFormer, a neural network architecture based on a linear attention transformer to enable long sequence inference for TinyML devices. Waveformer achieves a new state-of-the-art accuracy of 98.8 % and 99.1 % on the Google Speech V2 keyword spotting (KWS) dataset for the 12 and 35 class problems with only 130 kB of weight storage, compatible with MCU class devices. Top-1 accuracy is improved by 0.1 and 0.9 percentage points while reducing the model size and number of operations by 2.5× and 4.7× compared to the state of the art. We also propose a hardware-friendly 8-bit integer quantization algorithm for the linear attention operator, enabling efficient deployment on low-cost, ultra-low-power microcontrollers without loss of accuracy.

Abstract

Luca Benini Michele Magno Cristian Cioflan Moritz Scherer

Abstract

Build AI with AI

HyperAI Newsletters

Luca Benini Michele Magno Cristian Cioflan Moritz Scherer

Abstract

Build AI with AI

HyperAI Newsletters

Luca Benini Michele Magno Cristian Cioflan Moritz Scherer

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Work in Progress: Linear Transformers for TinyML

Luca Benini Michele Magno Cristian Cioflan Moritz Scherer

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Work in Progress: Linear Transformers for TinyML

Luca Benini Michele Magno Cristian Cioflan Moritz Scherer

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Work in Progress: Linear Transformers for TinyML

Luca Benini Michele Magno Cristian Cioflan Moritz Scherer

Abstract

Build AI with AI

HyperAI Newsletters