6 months ago

Saleh Soltan Shankar Ananthakrishnan Jack FitzGerald Rahul Gupta Wael Hamza Haidar Khan Charith Peris Stephen Rawls Andy Rosenbaum Anna Rumshisky

Abstract

In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks. In particular, we train a 20 billion parameter multilingual seq2seq model called Alexa Teacher Model (AlexaTM 20B) and show that it achieves state-of-the-art (SOTA) performance on 1-shot summarization tasks, outperforming a much larger 540B PaLM decoder model. AlexaTM 20B also achieves SOTA in 1-shot machine translation, especially for low-resource languages, across almost all language pairs supported by the model (Arabic, English, French, German, Hindi, Italian, Japanese, Marathi, Portuguese, Spanish, Tamil, and Telugu) on Flores-101 dataset. We also show in zero-shot setting, AlexaTM 20B outperforms GPT3 (175B) on SuperGLUE and SQuADv2 datasets and provides SOTA performance on multilingual tasks such as XNLI, XCOPA, Paws-X, and XWinograd. Overall, our results present a compelling case for seq2seq models as a powerful alternative to decoder-only models for Large-scale Language Model (LLM) training.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

LLM

Natural Language Processing

Text Generation

Method/Architecture

Natural Language Processing

Task/Problem

Saleh Soltan Shankar Ananthakrishnan Jack FitzGerald Rahul Gupta Wael Hamza Haidar Khan Charith Peris Stephen Rawls Andy Rosenbaum Anna Rumshisky

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

LLM

Natural Language Processing

Text Generation

Method/Architecture

Natural Language Processing

Task/Problem

Saleh Soltan Shankar Ananthakrishnan Jack FitzGerald Rahul Gupta Wael Hamza Haidar Khan Charith Peris Stephen Rawls Andy Rosenbaum Anna Rumshisky

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Saleh Soltan Shankar Ananthakrishnan Jack FitzGerald Rahul Gupta Wael Hamza Haidar Khan Charith Peris Stephen Rawls Andy Rosenbaum Anna Rumshisky6 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Saleh Soltan Shankar Ananthakrishnan Jack FitzGerald Rahul Gupta Wael Hamza Haidar Khan Charith Peris Stephen Rawls Andy Rosenbaum Anna Rumshisky6 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Saleh Soltan Shankar Ananthakrishnan Jack FitzGerald Rahul Gupta Wael Hamza Haidar Khan Charith Peris Stephen Rawls Andy Rosenbaum Anna Rumshisky6 more

Abstract

Build AI with AI

HyperAI Newsletters

Saleh Soltan Shankar Ananthakrishnan Jack FitzGerald Rahul Gupta Wael Hamza Haidar Khan Charith Peris Stephen Rawls Andy Rosenbaum Anna Rumshisky

Saleh Soltan Shankar Ananthakrishnan Jack FitzGerald Rahul Gupta Wael Hamza Haidar Khan Charith Peris Stephen Rawls Andy Rosenbaum Anna Rumshisky

Saleh Soltan Shankar Ananthakrishnan Jack FitzGerald Rahul Gupta Wael Hamza Haidar Khan Charith Peris Stephen Rawls Andy Rosenbaum Anna Rumshisky