3 months ago

Improving Language Understanding by Generative Pre-Training

{Tim Salimans, Ilya Sutskever, Alec Radford, Karthik Narasimhan}

Abstract

Natural language understanding comprises a wide range of diverse tasks suchas textual entailment, question answering, semantic similarity assessment, anddocument classification. Although large unlabeled text corpora are abundant,labeled data for learning these specific tasks is scarce, making it challenging fordiscriminatively trained models to perform adequately. We demonstrate that largegains on these tasks can be realized by generative pre-training of a language modelon a diverse corpus of unlabeled text, followed by discriminative fine-tuning on eachspecific task. In contrast to previous approaches, we make use of task-aware inputtransformations during fine-tuning to achieve effective transfer while requiringminimal changes to the model architecture. We demonstrate the effectiveness ofour approach on a wide range of benchmarks for natural language understanding.Our general task-agnostic model outperforms discriminatively trained models thatuse architectures specifically crafted for each task, significantly improving upon thestate of the art in 9 out of the 12 tasks studied. For instance, we achieve absoluteimprovements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% onquestion answering (RACE), and 1.5% on textual entailment (MultiNLI).