OCRFlux-3B: Intelligent Text Recognition Toolkit
1. Tutorial Introduction

OCRFlux-3B is a toolkit based on a multimodal large language model released by the ChatDOC team on June 17, 2025, which is used to convert PDFs and images into clean, readable, plain text Markdown text. This tool not only provides page-level text conversion functions, but also supports the merging of tables and paragraphs across pages, providing strong support for processing complex document structures.
This tutorial uses a single RTX 4090 card as the resource. The project provides three demonstration examples: PDF Document, Image Document, and Multiple Files.
2. Project Examples
PDF Document

Image Document

Multiple Files

3. Operation steps
1. After starting the container, click the API address to enter the Web interface

2. Usage steps
If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.

PDF Document

Parameter Description
- Advanced Settings:
- Target Image Dimension: Target image size, used to control the size of the generated image.
- Max Page Retries: Maximum number of retries, used to handle PDF page parsing errors.
- Skip Cross-Page Merge: Skip cross-page merge, used to process content across pages in a document.
Image Document

Multiple Files

4. Discussion
🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓
