HyperAI

Pre-training Once

Pre-training Once (POA) is a concept proposed by Ant Group in the paper “POA: Pre-training Once for Models of All SizesA three-branch self-supervised training framework proposed in the paper introduces an elastic student branch to randomly sample sub-networks for training in each pre-training step. POA can generate models of various sizes in a single pre-training, which is suitable for downstream tasks. Experiments have shown that it achieves state-of-the-art performance on multiple tasks.

Background

Large-scale self-supervised pre-training paves the way for one base model to handle many different visual tasks. Most pre-training methods only train a model of a specific size at a time. However, various computational or storage constraints in real-world scenarios require a lot of effort to develop a range of models of different sizes for deployment. This study addresses the above issues.

Pre-training Once Overview

Introducing the innovative elastic student branch into the modern self-refinement paradigm. In each pre-training step, the research team randomly extracts a subnetwork from the original student to form an elastic student, and trains all branches in a self-refining manner. Once pre-training is completed, POA can extract pre-trained models of different sizes for downstream tasks. Notably, the elastic student promotes the simultaneous pre-training of multiple models of different sizes, which also acts as an additional collection of models of various sizes to enhance representation learning. Extensive experiments (including k-nearest neighbors, linear probing evaluation, and evaluation on multiple downstream tasks) demonstrate the effectiveness and advantages of the study's POA. It achieves state-of-the-art performance using ViT, Swin Transformer, and ResNet backbones, generating about a hundred models of different sizes through a single pre-training session.