Date

3 months ago

Size

51.85 MB

1. Tutorial Introduction

dots.ocr is a multilingual document layout parsing model released by Xiaohongshu's hi lab in August 2025. Based on a 1.7 billion-parameter visual language model (VLM), the model integrates layout detection and content recognition, maintaining a good reading order. Despite its small size, the model achieves state-of-the-art performance, achieving excellent results on benchmarks such as OmniDocBench. Its formula recognition performance rivals that of larger models like Doubao-1.5 and Gemini2.5-Pro, demonstrating significant advantages in parsing minority languages. dots.ocr offers a simple and efficient architecture, requiring only a change in the input prompt to switch tasks. Its fast inference speed makes it suitable for a variety of document parsing scenarios.

This tutorial uses a single RTX 5090 card as the resource.

2. Project Examples

Formula Document Example

Table document example

Multilingual Documentation Example

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Usage steps

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.

Parameter Description

Select Prompt:
- layout_all_en: Recognizes all text in an image and preserves the original layout structure.
- layout_only_en: Recognize only English text in images and ignore other languages.
- OCR: Recognize text in images without preserving structure.
Advanced Settings:
- Enable fitz_preprocess for images: Whether to enable fitz_preprocess for images. Recommended if the image DPI is low.
- Min Pixels: The minimum number of pixels in an image, used to filter out images that are too small.
- Max Pixels: The maximum number of pixels in the image, used to filter out images that are too large.

This notebook is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Notebooks

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook

Date

3 months ago

Size

51.85 MB

1. Tutorial Introduction

This tutorial uses a single RTX 5090 card as the resource.

2. Project Examples

Formula Document Example

Table document example

Multilingual Documentation Example

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Usage steps

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.

Parameter Description

Select Prompt:
- layout_all_en: Recognizes all text in an image and preserves the original layout structure.
- layout_only_en: Recognize only English text in images and ignore other languages.
- OCR: Recognize text in images without preserving structure.
Advanced Settings:
- Enable fitz_preprocess for images: Whether to enable fitz_preprocess for images. Recommended if the image DPI is low.
- Min Pixels: The minimum number of pixels in an image, used to filter out images that are too small.
- Max Pixels: The maximum number of pixels in the image, used to filter out images that are too large.

Related Notebooks

Chandra: High-precision Document OCR

2 months ago

HunyuanOCR: Tencent Hunyuan End-to-End OCR

2 months ago

LightOnOCR-1B-Interface: A high-speed OCR Engine for Complex Documents

2 months ago

DeepSeek-OCR 2 Visual Causal Flow

10 days ago

LightOnOCR-2-1B Lightweight, High-Performance End-to-End OCR Model

7 days ago

PaddleOCR-VL: Multimodal Document Parsing

3 months ago

PixelReasoner-RL: Pixel-level Visual Inference Model

2 months ago

MarkItDown, Microsoft's open-source Document Conversion Tool

2 months ago

Krea-realtime-video: Real-time Video Generation Model

3 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

dots.ocr: a Multilingual Document Parsing Model

1. Tutorial Introduction

2. Project Examples

Formula Document Example

Table document example

Multilingual Documentation Example

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Usage steps

Build AI with AI

HyperAI Newsletters

Command Palette

dots.ocr: a Multilingual Document Parsing Model

1. Tutorial Introduction

2. Project Examples

Formula Document Example

Table document example

Multilingual Documentation Example

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Usage steps

Related Notebooks

Chandra: High-precision Document OCR

HunyuanOCR: Tencent Hunyuan End-to-End OCR

LightOnOCR-1B-Interface: A high-speed OCR Engine for Complex Documents

DeepSeek-OCR 2 Visual Causal Flow

LightOnOCR-2-1B Lightweight, High-Performance End-to-End OCR Model

PaddleOCR-VL: Multimodal Document Parsing

PixelReasoner-RL: Pixel-level Visual Inference Model

MarkItDown, Microsoft's open-source Document Conversion Tool

Krea-realtime-video: Real-time Video Generation Model

Build AI with AI

HyperAI Newsletters

Command Palette

dots.ocr: a Multilingual Document Parsing Model

1. Tutorial Introduction

2. Project Examples

Formula Document Example

Table document example

Multilingual Documentation Example

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Usage steps

Related Notebooks

Chandra: High-precision Document OCR

HunyuanOCR: Tencent Hunyuan End-to-End OCR

LightOnOCR-1B-Interface: A high-speed OCR Engine for Complex Documents

DeepSeek-OCR 2 Visual Causal Flow

LightOnOCR-2-1B Lightweight, High-Performance End-to-End OCR Model

PaddleOCR-VL: Multimodal Document Parsing

PixelReasoner-RL: Pixel-level Visual Inference Model

MarkItDown, Microsoft's open-source Document Conversion Tool

Krea-realtime-video: Real-time Video Generation Model

Build AI with AI

HyperAI Newsletters

Related Notebooks

Chandra: High-precision Document OCR

HunyuanOCR: Tencent Hunyuan End-to-End OCR

LightOnOCR-1B-Interface: A high-speed OCR Engine for Complex Documents

DeepSeek-OCR 2 Visual Causal Flow

LightOnOCR-2-1B Lightweight, High-Performance End-to-End OCR Model

PaddleOCR-VL: Multimodal Document Parsing

PixelReasoner-RL: Pixel-level Visual Inference Model

MarkItDown, Microsoft's open-source Document Conversion Tool

Krea-realtime-video: Real-time Video Generation Model

Related Notebooks

Chandra: High-precision Document OCR

HunyuanOCR: Tencent Hunyuan End-to-End OCR

LightOnOCR-1B-Interface: A high-speed OCR Engine for Complex Documents

DeepSeek-OCR 2 Visual Causal Flow

LightOnOCR-2-1B Lightweight, High-Performance End-to-End OCR Model

PaddleOCR-VL: Multimodal Document Parsing

PixelReasoner-RL: Pixel-level Visual Inference Model

MarkItDown, Microsoft's open-source Document Conversion Tool

Krea-realtime-video: Real-time Video Generation Model