9 months ago

Abstract

Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. We show that simple averaging of multiple points along the trajectory of SGD, with a cyclical or constant learning rate, leads to better generalization than conventional training. We also show that this Stochastic Weight Averaging (SWA) procedure finds much flatter solutions than SGD, and approximates the recent Fast Geometric Ensembling (FGE) approach with a single model. Using SWA we achieve notable improvement in test accuracy over conventional SGD training on a range of state-of-the-art residual networks, PyramidNets, DenseNets, and Shake-Shake networks on CIFAR-10, CIFAR-100, and ImageNet. In short, SWA is extremely easy to implement, improves generalization, and has almost no computational overhead.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

9 months ago

Neural Networks

Machine Learning

Convolutional Neural Network

Pavel Izmailov1 Dmitrii Podoprikhin2,3 Timur Garipov*4,5 Dmitry Vetrov2,3 Andrew Gordon Wilson1

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

9 months ago

Neural Networks

Machine Learning

Convolutional Neural Network

Pavel Izmailov1 Dmitrii Podoprikhin2,3 Timur Garipov*4,5 Dmitry Vetrov2,3 Andrew Gordon Wilson1

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Averaging Weights Leads to Wider Optima and Better Generalization

Pavel Izmailov*1 Dmitrii Podoprikhin*2,3 Timur Garipov*4,5 Dmitry Vetrov2,3 Andrew Gordon Wilson1

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Averaging Weights Leads to Wider Optima and Better Generalization

Pavel Izmailov*1 Dmitrii Podoprikhin*2,3 Timur Garipov*4,5 Dmitry Vetrov2,3 Andrew Gordon Wilson1

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Averaging Weights Leads to Wider Optima and Better Generalization

Pavel Izmailov*1 Dmitrii Podoprikhin*2,3 Timur Garipov*4,5 Dmitry Vetrov2,3 Andrew Gordon Wilson1

Abstract

Build AI with AI

HyperAI Newsletters

Pavel Izmailov1 Dmitrii Podoprikhin2,3 Timur Garipov*4,5 Dmitry Vetrov2,3 Andrew Gordon Wilson1

Pavel Izmailov1 Dmitrii Podoprikhin2,3 Timur Garipov*4,5 Dmitry Vetrov2,3 Andrew Gordon Wilson1

Pavel Izmailov1 Dmitrii Podoprikhin2,3 Timur Garipov*4,5 Dmitry Vetrov2,3 Andrew Gordon Wilson1