6 months ago

Haipeng Luo Qingfeng Sun Can Xu Pu Zhao Jianguang Lou Chongyang Tao Xiubo Geng Qingwei Lin Shifeng Chen Yansong Tang

Abstract

Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical CoT reasoning abilities of LLMs without using external python tools, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. Remarkably, WizardMath-Mistral 7B surpasses top-tier open-source LLMs by a substantial margin with higher data efficiency. Furthermore, WizardMath 70B even outperforms GPT-3.5-Turbo, Claude 2, Gemini Pro and GPT-4-early-version. Additionally, our preliminary exploration highlights the pivotal role of instruction evolution and process supervision in achieving exceptional math performance. For more details refer to https://github.com/nlpxucan/WizardLM

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

LLM

Intelligent Question Answering

Supervised Fine-Tuning

Method/Architecture

Natural Language Processing

Task/Problem

Haipeng Luo Qingfeng Sun Can Xu Pu Zhao Jianguang Lou Chongyang Tao Xiubo Geng Qingwei Lin Shifeng Chen Yansong Tang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

LLM

Intelligent Question Answering

Supervised Fine-Tuning

Method/Architecture

Natural Language Processing

Task/Problem

Haipeng Luo Qingfeng Sun Can Xu Pu Zhao Jianguang Lou Chongyang Tao Xiubo Geng Qingwei Lin Shifeng Chen Yansong Tang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo Qingfeng Sun Can Xu Pu Zhao Jianguang Lou Chongyang Tao Xiubo Geng Qingwei Lin Shifeng Chen Yansong Tang1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo Qingfeng Sun Can Xu Pu Zhao Jianguang Lou Chongyang Tao Xiubo Geng Qingwei Lin Shifeng Chen Yansong Tang1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo Qingfeng Sun Can Xu Pu Zhao Jianguang Lou Chongyang Tao Xiubo Geng Qingwei Lin Shifeng Chen Yansong Tang1 more

Abstract

Build AI with AI

HyperAI Newsletters

Haipeng Luo Qingfeng Sun Can Xu Pu Zhao Jianguang Lou Chongyang Tao Xiubo Geng Qingwei Lin Shifeng Chen Yansong Tang

Haipeng Luo Qingfeng Sun Can Xu Pu Zhao Jianguang Lou Chongyang Tao Xiubo Geng Qingwei Lin Shifeng Chen Yansong Tang

Haipeng Luo Qingfeng Sun Can Xu Pu Zhao Jianguang Lou Chongyang Tao Xiubo Geng Qingwei Lin Shifeng Chen Yansong Tang