HyperAI

Abstract

Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full finetuning, matching the top supervised pre-trained models. An even larger model trained on a mixture of ImageNet and web images is competitive with self-supervised benchmarks on ImageNet, achieving 72.0% top-1 accuracy on a linear probe of our features.

Abstract

Mark Chen Jeff Wu Rewon Child Ilya Sutskever David Luan Alec Radford Heewoo Jun Prafulla Dhariwal

Abstract

Build AI with AI

HyperAI Newsletters

Mark Chen Jeff Wu Rewon Child Ilya Sutskever David Luan Alec Radford Heewoo Jun Prafulla Dhariwal

Abstract

Build AI with AI

HyperAI Newsletters

Mark Chen Jeff Wu Rewon Child Ilya Sutskever David Luan Alec Radford Heewoo Jun Prafulla Dhariwal

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Generative Pretraining from Pixels

Mark Chen Jeff Wu Rewon Child Ilya Sutskever David Luan Alec Radford Heewoo Jun Prafulla Dhariwal

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Generative Pretraining from Pixels

Mark Chen Jeff Wu Rewon Child Ilya Sutskever David Luan Alec Radford Heewoo Jun Prafulla Dhariwal

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Generative Pretraining from Pixels

Mark Chen Jeff Wu Rewon Child Ilya Sutskever David Luan Alec Radford Heewoo Jun Prafulla Dhariwal

Abstract

Build AI with AI

HyperAI Newsletters