HyperAIHyperAI

Command Palette

Search for a command to run...

Parameter Re-Initialization through Cyclical Batch Size Schedules

Norman Mu* Zhewei Yao* Amir Gholami Kurt Keutzer Michael W. Mahoney

Abstract

Optimal parameter initialization remains a crucial problem for neural network training. A poor weight initialization may take longer to train and/or converge to sub-optimal solutions. Here, we propose a method of weight re-initialization by repeated annealing and injection of noise in the training process. We implement this through a cyclical batch size schedule motivated by a Bayesian perspective of neural network training. We evaluate our methods through extensive experiments on tasks in language modeling, natural language inference, and image classification. We demonstrate the ability of our method to improve language modeling performance by up to 7.91 perplexity and reduce training iterations by up to 61%61\%61%, in addition to its flexibility in enabling snapshot ensembling and use with adversarial training.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Parameter Re-Initialization through Cyclical Batch Size Schedules | Papers | HyperAI