Online Tutorial | Up to 4x Faster Generation Speed: DiffusionGemma Can Generate Entire Blocks of Text Simultaneously, With Continuous Optimization Based on multi-round Parallel denoising.

On June 11th, Google officially open-sourced DiffusionGemma, a text generation model built on Discrete Diffusion technology. It leverages the industry-leading intelligence-per-parameter capabilities of the Gemma 4 series and cutting-edge Gemini Diffusion research, integrating a new Diffusion Head to maximize generation speed. Unlike traditional large models that output text token by token, it can generate entire text blocks simultaneously and continuously optimize the results through multiple rounds of parallel denoising.This results in a generation speed increase of up to 4 times.

Official data shows that DiffusionGemma can achieve a generation speed of over 1100 Tokens/s on a single NVIDIA H100 GPU and over 700 Tokens/s on a GeForce RTX 5090, far exceeding autoregressive models of the same level.

From the perspective of architecture,DiffusionGemma employs a 26B parameter-level hybrid expert (MoE) design.The total number of parameters is approximately 25.2 billion, but only 3.8B parameters are activated during inference, significantly reducing computational overhead while maintaining strong inference capabilities. The model is built on an encoder-decoder structure and incorporates a bidirectional attention mechanism, enabling it to process 256 tokens in parallel at once. It also supports tasks that heavily rely on the global context, such as inline text editing, code completion, and mathematical structure generation.

In addition, DiffusionGemma supports long contexts of up to 256K tokens, multimodal graph and text input, and inference modes activated by <|think|>, providing developers with new technology options for exploring next-generation high-efficiency AI applications.

Although Google still emphasizes that the standard Gemma 4 is more suitable for production environments in terms of generated quality, the diffusion-based text generation capabilities demonstrated by DiffusionGemma may be opening up another noteworthy new path for the development of large language models.

To make it easy for developers to experience DiffusionGemma with minimal effort, HyperAI quickly followed up after the model was open-sourced and has now launched an easy-to-deploy Notebook, which can verify the model's powerful capabilities using only a single NVIDIA RTX Pro 6000 graphics card.

Run online:https://go.hyper.ai/879dB

More online tutorials:

https://hyper.ai/notebooks

Demo Run

1. After entering the hyper.ai homepage, select the "Tutorials" page, or click "View More Tutorials", select "DiffusionGemma: High-Speed Text Generation Model Based on Discrete Diffusion", and click "Run this tutorial".

2. After the page redirects, click "Clone" in the upper right corner to clone the tutorial into your own container.

Note: You can switch languages in the upper right corner of the page. Currently, Chinese and English are available. This tutorial will show the steps in English.

3. Select the "NVIDIA RTX Pro 6000" and "vLLM" images, and click "Continue job execution".

4. Wait for resources to be allocated. Once the status changes to "Running", click "Open Workspace" to enter the Jupyter Workspace.

Effect display

1. After the page redirects, click on the README file on the left, and then click on Run at the top.

2. After the process is complete, click the API address on the right to open the Demo interface.

HyperAI

Online Tutorial | Up to 4x Faster Generation Speed: DiffusionGemma Can Generate Entire Blocks of Text Simultaneously, With Continuous Optimization Based on multi-round Parallel denoising.

2 months ago

Information

Artificial Intelligence

Machine Learning

Deep Learning

Run online:https://go.hyper.ai/879dB

More online tutorials:

https://hyper.ai/notebooks

Demo Run

2. After the page redirects, click "Clone" in the upper right corner to clone the tutorial into your own container.

Note: You can switch languages in the upper right corner of the page. Currently, Chinese and English are available. This tutorial will show the steps in English.

3. Select the "NVIDIA RTX Pro 6000" and "vLLM" images, and click "Continue job execution".

4. Wait for resources to be allocated. Once the status changes to "Running", click "Open Workspace" to enter the Jupyter Workspace.

Effect display

1. After the page redirects, click on the README file on the left, and then click on Run at the top.

2. After the process is complete, click the API address on the right to open the Demo interface.

Online Tutorial | Up to 4x Faster Generation Speed: DiffusionGemma Can Generate Entire Blocks of Text Simultaneously, With Continuous Optimization Based on multi-round Parallel denoising.

2 months ago

Information

Artificial Intelligence

Machine Learning

Deep Learning

Run online:https://go.hyper.ai/879dB

More online tutorials:

https://hyper.ai/notebooks

Demo Run

2. After the page redirects, click "Clone" in the upper right corner to clone the tutorial into your own container.

Note: You can switch languages in the upper right corner of the page. Currently, Chinese and English are available. This tutorial will show the steps in English.

3. Select the "NVIDIA RTX Pro 6000" and "vLLM" images, and click "Continue job execution".

4. Wait for resources to be allocated. Once the status changes to "Running", click "Open Workspace" to enter the Jupyter Workspace.

Effect display

1. After the page redirects, click on the README file on the left, and then click on Run at the top.

2. After the process is complete, click the API address on the right to open the Demo interface.

Command Palette

Online Tutorial | Up to 4x Faster Generation Speed: DiffusionGemma Can Generate Entire Blocks of Text Simultaneously, With Continuous Optimization Based on multi-round Parallel denoising.

Demo Run

Effect display

Command Palette

Online Tutorial | Up to 4x Faster Generation Speed: DiffusionGemma Can Generate Entire Blocks of Text Simultaneously, With Continuous Optimization Based on multi-round Parallel denoising.

Demo Run

Effect display

Related News

Online Tutorial | 16GB Laptop Achieves Nearly 26B MoE Performance: Gemma 4 12B Based on Innovative Architecture for Unified Processing of Text/Image/Sound Modalities

OpenAI Releases GeneBench-Pro, Which Assesses AI Research Capabilities Across 129 Questions and 10 domains.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Dataset Summary | NVIDIA Open Sources Nemotron Datasets: Over 10TB of Tokens + 40M Training Samples, Covering Mathematical Reasoning, Code Generation, and Multilingual dialogue.

Google Releases TabFM-1.0.0-PyTorch: a zero-shot Prediction Model Designed for Mixed Tabular Data; NVIDIA open-sources Multinational Synthetic Character Dataset, With Tens of Millions of Characters available.

ICML 26 Outstanding Papers: Tsinghua JustGRPO Overcomes the dLLM Inference Bottleneck; Say Goodbye to Simple Instruction Tests: Agents Last Exam Comprehensively Evaluates the long-range Professional Capabilities of Intelligent agents.

Online Tutorial | In-depth Guide to Instruction Following/Inference/Coding: Mistral Medium 3.5 Brings Coding Agents to the Cloud

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

Online Tutorial | NVIDIA Open Source LocateAnything, a 3B Model That Enables Image and Video Target Pointing, Open Vocabulary Object Detection, Target Localization, OCR Text Localization, and Other functions.

Command Palette

Online Tutorial | Up to 4x Faster Generation Speed: DiffusionGemma Can Generate Entire Blocks of Text Simultaneously, With Continuous Optimization Based on multi-round Parallel denoising.

Demo Run

Effect display

Related News

Online Tutorial | 16GB Laptop Achieves Nearly 26B MoE Performance: Gemma 4 12B Based on Innovative Architecture for Unified Processing of Text/Image/Sound Modalities

OpenAI Releases GeneBench-Pro, Which Assesses AI Research Capabilities Across 129 Questions and 10 domains.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Dataset Summary | NVIDIA Open Sources Nemotron Datasets: Over 10TB of Tokens + 40M Training Samples, Covering Mathematical Reasoning, Code Generation, and Multilingual dialogue.

Google Releases TabFM-1.0.0-PyTorch: a zero-shot Prediction Model Designed for Mixed Tabular Data; NVIDIA open-sources Multinational Synthetic Character Dataset, With Tens of Millions of Characters available.

ICML 26 Outstanding Papers: Tsinghua JustGRPO Overcomes the dLLM Inference Bottleneck; Say Goodbye to Simple Instruction Tests: Agents Last Exam Comprehensively Evaluates the long-range Professional Capabilities of Intelligent agents.

Online Tutorial | In-depth Guide to Instruction Following/Inference/Coding: Mistral Medium 3.5 Brings Coding Agents to the Cloud

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

Online Tutorial | NVIDIA Open Source LocateAnything, a 3B Model That Enables Image and Video Target Pointing, Open Vocabulary Object Detection, Target Localization, OCR Text Localization, and Other functions.

Related News

Online Tutorial | 16GB Laptop Achieves Nearly 26B MoE Performance: Gemma 4 12B Based on Innovative Architecture for Unified Processing of Text/Image/Sound Modalities

OpenAI Releases GeneBench-Pro, Which Assesses AI Research Capabilities Across 129 Questions and 10 domains.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Dataset Summary | NVIDIA Open Sources Nemotron Datasets: Over 10TB of Tokens + 40M Training Samples, Covering Mathematical Reasoning, Code Generation, and Multilingual dialogue.

Google Releases TabFM-1.0.0-PyTorch: a zero-shot Prediction Model Designed for Mixed Tabular Data; NVIDIA open-sources Multinational Synthetic Character Dataset, With Tens of Millions of Characters available.

ICML 26 Outstanding Papers: Tsinghua JustGRPO Overcomes the dLLM Inference Bottleneck; Say Goodbye to Simple Instruction Tests: Agents Last Exam Comprehensively Evaluates the long-range Professional Capabilities of Intelligent agents.

Online Tutorial | In-depth Guide to Instruction Following/Inference/Coding: Mistral Medium 3.5 Brings Coding Agents to the Cloud

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

Online Tutorial | NVIDIA Open Source LocateAnything, a 3B Model That Enables Image and Video Target Pointing, Open Vocabulary Object Detection, Target Localization, OCR Text Localization, and Other functions.

Related News

Online Tutorial | 16GB Laptop Achieves Nearly 26B MoE Performance: Gemma 4 12B Based on Innovative Architecture for Unified Processing of Text/Image/Sound Modalities

OpenAI Releases GeneBench-Pro, Which Assesses AI Research Capabilities Across 129 Questions and 10 domains.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Dataset Summary | NVIDIA Open Sources Nemotron Datasets: Over 10TB of Tokens + 40M Training Samples, Covering Mathematical Reasoning, Code Generation, and Multilingual dialogue.

Google Releases TabFM-1.0.0-PyTorch: a zero-shot Prediction Model Designed for Mixed Tabular Data; NVIDIA open-sources Multinational Synthetic Character Dataset, With Tens of Millions of Characters available.

ICML 26 Outstanding Papers: Tsinghua JustGRPO Overcomes the dLLM Inference Bottleneck; Say Goodbye to Simple Instruction Tests: Agents Last Exam Comprehensively Evaluates the long-range Professional Capabilities of Intelligent agents.

Online Tutorial | In-depth Guide to Instruction Following/Inference/Coding: Mistral Medium 3.5 Brings Coding Agents to the Cloud

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

Online Tutorial | NVIDIA Open Source LocateAnything, a 3B Model That Enables Image and Video Target Pointing, Open Vocabulary Object Detection, Target Localization, OCR Text Localization, and Other functions.