HyperAIHyperAI

Command Palette

Search for a command to run...

Language Models are Unsupervised Multitask Learners

Jeffrey Wu Rewon Child Ilya Sutskever David Luan Alec Radford Dario Amodei

Abstract

Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typicallyapproached with supervised learning on taskspecific datasets. We demonstrate that languagemodels begin to learn these tasks without any explicit supervision when trained on a new datasetof millions of webpages called WebText. Whenconditioned on a document plus questions, the answers generated by the language model reach 55F1 on the CoQA dataset - matching or exceedingthe performance of 3 out of 4 baseline systemswithout using the 127,000+ training examples.The capacity of the language model is essentialto the success of zero-shot task transfer and increasing it improves performance in a log-linearfashion across tasks. Our largest model, GPT-2,is a 1.5B parameter Transformer that achievesstate of the art results on 7 out of 8 tested language modeling datasets in a zero-shot settingbut still underfits WebText. Samples from themodel reflect these improvements and contain coherent paragraphs of text. These findings suggesta promising path towards building language processing systems which learn to perform tasks fromtheir naturally occurring demonstrations.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Language Models are Unsupervised Multitask Learners | Papers | HyperAI