HyperAI

Multimodal Contrastive Learning With Joint Example Selection (JEST)

Multimodal Contrastive Learning with Joint Example Selection (JEST) is a new algorithm proposed by the DeepMind research team in 2024.Data curation via joint example selection further accelerates multimodal learning". JEST aims to solve the problem of high energy consumption during the training of large language models (such as ChatGPT). The JEST algorithm significantly reduces the required computing resources and time by selecting high-quality sub-batches from large-scale "super batches" for training.

The core idea of the JEST algorithm is to use multimodal contrastive learning and joint example selection to improve training efficiency. It first evaluates the learnability of the entire sub-batch, then samples according to the score and selects the sub-batch most relevant to learning for training. This method not only improves training efficiency, but also speeds up multimodal learning. When using the filtering ratio of 50%, 80%, and 90%, only 2 billion, 1 billion, and 670 million training samples are needed respectively to achieve the final performance of the 3 billion uniform benchmark.

In addition, the JEST algorithm also considers the synergistic effect between multi-resolution training and online batch selection, further reducing the computational cost.