HyperAIHyperAI

Command Palette

Search for a command to run...

21 days ago

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
  for LLMs

Abstract

We propose QeRL, a Quantization-enhanced Reinforcement Learning framework forlarge language models (LLMs). While RL is essential for LLMs' reasoningcapabilities, it is resource-intensive, requiring substantial GPU memory andlong rollout durations. QeRL addresses these issues by combining NVFP4quantization with Low-Rank Adaptation (LoRA), accelerating rollout phase of RLwhile reducing memory overhead. Beyond efficiency, our findings show thatquantization noise increases policy entropy, enhancing exploration, andenabling the discovery of better strategies during RL. To further optimizeexploration, QeRL introduces an Adaptive Quantization Noise (AQN) mechanism,which dynamically adjusts noise during training. Experiments demonstrate thatQeRL delivers over 1.5 times speedup in the rollout phase. Moreover, this isthe first framework to enable RL training of a 32B LLM on a single H100 80GBGPU, while delivering overall speedups for RL training. It also achieves fasterreward growth and higher final accuracy than 16-bit LoRA and QLoRA, whilematching the performance of full-parameter fine-tuning on mathematicalbenchmarks such as GSM8K (90.8%) and MATH 500 (77.4%) in the 7B model. Theseresults establish QeRL as an efficient and effective framework for RL trainingin LLMs.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs | Papers | HyperAI