1. Tutorial Introduction

OCRFlux-3B is a toolkit based on a multimodal large language model released by the ChatDOC team on June 17, 2025, which is used to convert PDFs and images into clean, readable, plain text Markdown text. This tool not only provides page-level text conversion functions, but also supports the merging of tables and paragraphs across pages, providing strong support for processing complex document structures.

This tutorial uses a single RTX 4090 card as the resource. The project provides three demonstration examples: PDF Document, Image Document, and Multiple Files.

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Usage steps

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.

PDF Document

Parameter Description

Advanced Settings:

Target Image Dimension: Target image size, used to control the size of the generated image.
Max Page Retries: Maximum number of retries, used to handle PDF page parsing errors.
Skip Cross-Page Merge: Skip cross-page merge, used to process content across pages in a document.

Image Document

Multiple Files

HyperAI

Run this Notebook

Date

3 months ago

Size

3.36 MB

1. Tutorial Introduction

This tutorial uses a single RTX 4090 card as the resource. The project provides three demonstration examples: PDF Document, Image Document, and Multiple Files.

2. Project Examples

PDF Document

Image Document

Multiple Files

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Usage steps

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.

PDF Document

Parameter Description

Advanced Settings:
- Target Image Dimension: Target image size, used to control the size of the generated image.
- Max Page Retries: Maximum number of retries, used to handle PDF page parsing errors.
- Skip Cross-Page Merge: Skip cross-page merge, used to process content across pages in a document.

Image Document

Multiple Files

This notebook is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Notebooks

MonkeyOCR: Document Parsing Based on the structure-recognition-relation Triple Paradigm

3 months ago

PaddleOCR-VL: Multimodal Document Parsing

3 months ago

MarkItDown, Microsoft's open-source Document Conversion Tool

2 months ago

Krea-realtime-video: Real-time Video Generation Model

3 months ago

ROCKET-2: 3D Game Zero-Shot Transfer

2 months ago

Depth-Anything-3: Restoring Visual Space From Any Perspective

2 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook

Date

3 months ago

Size

3.36 MB

1. Tutorial Introduction

This tutorial uses a single RTX 4090 card as the resource. The project provides three demonstration examples: PDF Document, Image Document, and Multiple Files.

2. Project Examples

PDF Document

Image Document

Multiple Files

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Usage steps

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.

PDF Document

Parameter Description

Advanced Settings:
- Target Image Dimension: Target image size, used to control the size of the generated image.
- Max Page Retries: Maximum number of retries, used to handle PDF page parsing errors.
- Skip Cross-Page Merge: Skip cross-page merge, used to process content across pages in a document.

Image Document

Multiple Files

Related Notebooks

Docling: Document Parsing Tool

2 months ago

Chandra: High-precision Document OCR

2 months ago

F5-E2 TTS Clones Any Sound in Just 3 Seconds

2 months ago

MonkeyOCR: Document Parsing Based on the structure-recognition-relation Triple Paradigm

3 months ago

PaddleOCR-VL: Multimodal Document Parsing

3 months ago

MarkItDown, Microsoft's open-source Document Conversion Tool

2 months ago

Krea-realtime-video: Real-time Video Generation Model

3 months ago

ROCKET-2: 3D Game Zero-Shot Transfer

2 months ago

Depth-Anything-3: Restoring Visual Space From Any Perspective

2 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

OCRFlux-3B: Intelligent Text Recognition Toolkit

1. Tutorial Introduction

2. Project Examples

PDF Document

Image Document

Multiple Files

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Usage steps

PDF Document

Image Document

Multiple Files

Build AI with AI

HyperAI Newsletters

Command Palette

OCRFlux-3B: Intelligent Text Recognition Toolkit

1. Tutorial Introduction

2. Project Examples

PDF Document

Image Document

Multiple Files

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Usage steps

PDF Document

Image Document

Multiple Files

Related Notebooks

Docling: Document Parsing Tool

Chandra: High-precision Document OCR

F5-E2 TTS Clones Any Sound in Just 3 Seconds

MonkeyOCR: Document Parsing Based on the structure-recognition-relation Triple Paradigm

PaddleOCR-VL: Multimodal Document Parsing

MarkItDown, Microsoft's open-source Document Conversion Tool

Krea-realtime-video: Real-time Video Generation Model

ROCKET-2: 3D Game Zero-Shot Transfer

Depth-Anything-3: Restoring Visual Space From Any Perspective

Build AI with AI

HyperAI Newsletters

Command Palette

OCRFlux-3B: Intelligent Text Recognition Toolkit

1. Tutorial Introduction

2. Project Examples

PDF Document

Image Document

Multiple Files

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Usage steps

PDF Document

Image Document

Multiple Files

Related Notebooks

Docling: Document Parsing Tool

Chandra: High-precision Document OCR

F5-E2 TTS Clones Any Sound in Just 3 Seconds

MonkeyOCR: Document Parsing Based on the structure-recognition-relation Triple Paradigm

PaddleOCR-VL: Multimodal Document Parsing

MarkItDown, Microsoft's open-source Document Conversion Tool

Krea-realtime-video: Real-time Video Generation Model

ROCKET-2: 3D Game Zero-Shot Transfer

Depth-Anything-3: Restoring Visual Space From Any Perspective

Build AI with AI

HyperAI Newsletters

Related Notebooks

Docling: Document Parsing Tool

Chandra: High-precision Document OCR

F5-E2 TTS Clones Any Sound in Just 3 Seconds

MonkeyOCR: Document Parsing Based on the structure-recognition-relation Triple Paradigm

PaddleOCR-VL: Multimodal Document Parsing

MarkItDown, Microsoft's open-source Document Conversion Tool

Krea-realtime-video: Real-time Video Generation Model

ROCKET-2: 3D Game Zero-Shot Transfer

Depth-Anything-3: Restoring Visual Space From Any Perspective

Related Notebooks

Docling: Document Parsing Tool

Chandra: High-precision Document OCR

F5-E2 TTS Clones Any Sound in Just 3 Seconds

MonkeyOCR: Document Parsing Based on the structure-recognition-relation Triple Paradigm