1. Tutorial Introduction

RolmOCR is an open source OCR tool developed by the Reducto AI team in April 2025, based on the Qwen2.5-VL-7B visual language model. It can extract text from images and PDFs quickly and with low memory, outperforming similar tools such as olmOCR. RolmOCR does not rely on PDF metadata, simplifies the process and supports multiple document types, such as handwritten notes and academic papers. The Reducto team aims to improve the efficiency of document digitization through model updates and training data optimization.

This tutorial uses RolmOCR as a demonstration, the image uses vllm 0.7.3-2204, and the computing resource uses RTX 4090.

2. Function List

Fast text extraction: Extract text from images and PDFs with fast processing speed, suitable for large amounts of documents.

Supports a variety of documents: can recognize handwritten notes, printed documents and complex tables.

Open source and free: Released under the Apache 2.0 license, the code can be freely downloaded and adapted.

Low memory usage: It is more resource-efficient than olmOCR and has low computer requirements when running.

No metadata required: Work directly with the original document without relying on additional information from the PDF.

Enhanced tilted document recognition: 15% is rotated in the training data to improve the adaptability to documents with non-positive angles.

Based on the latest model: Using Qwen2.5-VL-7B to improve recognition accuracy and efficiency.

Exchange and discussion

🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

HyperAI

Run this Notebook Discuss on Discord

Date

10 months ago

Size

360.51 MB

1. Tutorial Introduction

This tutorial uses RolmOCR as a demonstration, the image uses vllm 0.7.3-2204, and the computing resource uses RTX 4090.

2. Function List

Fast text extraction: Extract text from images and PDFs with fast processing speed, suitable for large amounts of documents.
Supports a variety of documents: can recognize handwritten notes, printed documents and complex tables.
Open source and free: Released under the Apache 2.0 license, the code can be freely downloaded and adapted.
Low memory usage: It is more resource-efficient than olmOCR and has low computer requirements when running.
No metadata required: Work directly with the original document without relying on additional information from the PDF.
Enhanced tilted document recognition: 15% is rotated in the training data to improve the adaptability to documents with non-positive angles.
Based on the latest model: Using Qwen2.5-VL-7B to improve recognition accuracy and efficiency.

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

If "Bad Gateway" is displayed, it means the model is initializing. Please wait for about 1-2 minutes and refresh the page.

2. Functional Demonstration

Citation Information

Thanks to GitHub user boyswu For the production of this tutorial, the project reference information is as follows:

@misc{RolmOCR,
  author = {Reducto AI},
  title = {RolmOCR: A Faster, Lighter Open Source OCR Model},
  year = {2025},
}

Exchange and discussion

This notebook is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Notebooks

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

1. Tutorial Introduction

This tutorial uses RolmOCR as a demonstration, the image uses vllm 0.7.3-2204, and the computing resource uses RTX 4090.

2. Function List

Fast text extraction: Extract text from images and PDFs with fast processing speed, suitable for large amounts of documents.

Supports a variety of documents: can recognize handwritten notes, printed documents and complex tables.

Open source and free: Released under the Apache 2.0 license, the code can be freely downloaded and adapted.

Low memory usage: It is more resource-efficient than olmOCR and has low computer requirements when running.

No metadata required: Work directly with the original document without relying on additional information from the PDF.

Enhanced tilted document recognition: 15% is rotated in the training data to improve the adaptability to documents with non-positive angles.

Based on the latest model: Using Qwen2.5-VL-7B to improve recognition accuracy and efficiency.

Exchange and discussion

Command Palette

RolmOCR Cross-scenario ultra-fast OCR Open Source Recognition New Benchmark

1. Tutorial Introduction

2. Function List

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Functional Demonstration

Citation Information

Exchange and discussion

Build AI with AI

HyperAI Newsletters

Command Palette

RolmOCR Cross-scenario ultra-fast OCR Open Source Recognition New Benchmark

1. Tutorial Introduction

2. Function List

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Functional Demonstration

Citation Information

Exchange and discussion

Related Notebooks

Chandra: High-precision Document OCR

GLM-OCR Lightweight Multimodal OCR Recognition System

HunyuanOCR: Tencent Hunyuan End-to-End OCR

LightOnOCR-1B-Interface: A high-speed OCR Engine for Complex Documents

DeepSeek-OCR 2 Visual Causal Flow

LightOnOCR-2-1B Lightweight, High-Performance End-to-End OCR Model

PaddleOCR-VL-1.5: Local OCR Based on vLLM

OCRFlux-3B: Intelligent Text Recognition Toolkit

MarkItDown, Microsoft's open-source Document Conversion Tool

Build AI with AI

HyperAI Newsletters

Command Palette

RolmOCR Cross-scenario ultra-fast OCR Open Source Recognition New Benchmark

1. Tutorial Introduction

2. Function List

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Functional Demonstration

Citation Information

Exchange and discussion

Related Notebooks

Chandra: High-precision Document OCR

GLM-OCR Lightweight Multimodal OCR Recognition System

HunyuanOCR: Tencent Hunyuan End-to-End OCR

LightOnOCR-1B-Interface: A high-speed OCR Engine for Complex Documents

DeepSeek-OCR 2 Visual Causal Flow

LightOnOCR-2-1B Lightweight, High-Performance End-to-End OCR Model

PaddleOCR-VL-1.5: Local OCR Based on vLLM

OCRFlux-3B: Intelligent Text Recognition Toolkit

MarkItDown, Microsoft's open-source Document Conversion Tool

Build AI with AI

HyperAI Newsletters

Related Notebooks

Chandra: High-precision Document OCR

GLM-OCR Lightweight Multimodal OCR Recognition System

HunyuanOCR: Tencent Hunyuan End-to-End OCR

LightOnOCR-1B-Interface: A high-speed OCR Engine for Complex Documents

DeepSeek-OCR 2 Visual Causal Flow

LightOnOCR-2-1B Lightweight, High-Performance End-to-End OCR Model

PaddleOCR-VL-1.5: Local OCR Based on vLLM

OCRFlux-3B: Intelligent Text Recognition Toolkit

MarkItDown, Microsoft's open-source Document Conversion Tool

Related Notebooks

Chandra: High-precision Document OCR

GLM-OCR Lightweight Multimodal OCR Recognition System

HunyuanOCR: Tencent Hunyuan End-to-End OCR

LightOnOCR-1B-Interface: A high-speed OCR Engine for Complex Documents

DeepSeek-OCR 2 Visual Causal Flow

LightOnOCR-2-1B Lightweight, High-Performance End-to-End OCR Model

PaddleOCR-VL-1.5: Local OCR Based on vLLM

OCRFlux-3B: Intelligent Text Recognition Toolkit

MarkItDown, Microsoft's open-source Document Conversion Tool