HyperAIHyperAI

Command Palette

Search for a command to run...

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

Weijie Su extsuperscript1,*† Xizhou Zhu extsuperscript2,4,*‡ Chenxin Tao extsuperscript3,*† Lewei Lu extsuperscript2 Bin Li extsuperscript1 Gao Huang extsuperscript3 Yu Qiao extsuperscript4 Xiaogang Wang extsuperscript5,2 Jie Zhou extsuperscript3 Jifeng Dai extsuperscript3,4

Abstract

To effectively exploit the potential of large-scale models, variouspre-training strategies supported by massive data from different sources areproposed, including supervised pre-training, weakly-supervised pre-training,and self-supervised pre-training. It has been proved that combining multiplepre-training strategies and data from various modalities/sources can greatlyboost the training of large-scale models. However, current works adopt amulti-stage pre-training system, where the complex pipeline may increase theuncertainty and instability of the pre-training. It is thus desirable thatthese strategies can be integrated in a single-stage manner. In this paper, wefirst propose a general multi-modal mutual information formula as a unifiedoptimization target and demonstrate that all existing approaches are specialcases of our framework. Under this unified perspective, we propose anall-in-one single-stage pre-training approach, named Maximizing Multi-modalMutual Information Pre-training (M3I Pre-training). Our approach achievesbetter performance than previous pre-training methods on various visionbenchmarks, including ImageNet classification, COCO object detection, LVISlong-tailed object detection, and ADE20k semantic segmentation. Notably, wesuccessfully pre-train a billion-level parameter image backbone and achievestate-of-the-art performance on various benchmarks. Code shall be released athttps://github.com/OpenGVLab/M3I-Pretraining.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information | Papers | HyperAI