dots.ocr: a Multilingual Document Parsing Model
1. Tutorial Introduction
dots.ocr is a multilingual document layout parsing model released by Xiaohongshu's hi lab in August 2025. Based on a 1.7 billion-parameter visual language model (VLM), the model integrates layout detection and content recognition, maintaining a good reading order. Despite its small size, the model achieves state-of-the-art performance, achieving excellent results on benchmarks such as OmniDocBench. Its formula recognition performance rivals that of larger models like Doubao-1.5 and Gemini2.5-Pro, demonstrating significant advantages in parsing minority languages. dots.ocr offers a simple and efficient architecture, requiring only a change in the input prompt to switch tasks. Its fast inference speed makes it suitable for a variety of document parsing scenarios.
This tutorial uses resources for a single RTX 4090 card.
2. Project Examples
Formula Document Example

Table document example

Multilingual Documentation Example



3. Operation steps
1. After starting the container, click the API address to enter the Web interface

2. Usage steps
If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.

Parameter Description
- Select Prompt:
- layout_all_en: Recognizes all text in an image and preserves the original layout structure.
- layout_only_en: Recognize only English text in images and ignore other languages.
- OCR: Recognize text in images without preserving structure.
- Advanced Settings:
- Enable fitz_preprocess for images: Whether to enable fitz_preprocess for images. Recommended if the image DPI is low.
- Min Pixels: The minimum number of pixels in an image, used to filter out images that are too small.
- Max Pixels: The maximum number of pixels in the image, used to filter out images that are too large.
4. Discussion
🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓
