HyperAI

Online Tutorial | Can It Run on Consumer-grade Graphics Cards? The Lightweight Model OCRFlux-3B Achieves Intelligent Recognition of Complex Text for the First Time

特色图像

When converting PDF, you no longer have to worry about formulas, tables and cross-page text disrupting the layout!

Supporting batch document parsing, structured information extraction, and compatibility with cross-page content merging... The ChatDOC team released OCRFlux-3B, a toolkit based on a multimodal large-scale language model, which can convert PDFs and images into clean, readable, plain text Markdown format.

OCRFlux-3B provides page-level text conversion capabilities, which can accurately convert text in PDF and images into Markdown format.This tool can not only handle complex tables such as repeated headers, cross-row or cross-column tables, horizontal paging, and nested structures, but also recognize complex formulas in papers, and also supports the merging of cross-page tables and paragraphs.The natural reading order of the text is maintained even with complex multi-column layouts, graphics, and inserts. After converting PDF files into editable and searchable Markdown text, researchers can quickly extract tables and formulas from PDF documents.Currently, OCRFlux-3B is the first model in the open source OCR project to achieve this capability.

OCRFlux-3B is a lightweight model fine-tuned based on the Qwen2.5-VL-3B-Instruct multimodal visual language model, so it can also run on consumer-grade graphics cards (such as GTX 3090).

at present,「OCRFlux-3B: Intelligent Text Recognition Toolkit」The tutorial section of HyperAI's official website (hyper.ai) is now available. With one-click deployment, you can experience converting PDF documents, image documents, and multiple files into searchable Markdown text online. Come and experience it!

Tutorial Link:

https://go.hyper.ai/0K2OY

HyperAI exclusive invitation link (copy and open in browser):

https://openbayes.com/console/signup?r=Ada0322_NR0n

Demo Run

1. After entering the hyper.ai homepage, select the "Tutorials" page, select "OCRFlux-3B: Intelligent Text Recognition Toolkit", and click "Run this tutorial online".

2. After the page jumps, click "Clone" in the upper right corner to clone the tutorial into your own container.

3. Select "NVIDIA GeForce RTX 4090". The OpenBayes platform provides 4 billing methods. You can choose "pay as you go" or "daily/weekly/monthly" according to your needs. After selecting the image "PyTorch", click "Continue". New users can register using the invitation link below to get 4 hours of RTX 4090 + 5 hours of CPU free time!

HyperAI exclusive invitation link (copy and open in browser):

https://openbayes.com/console/signup?r=Ada0322_NR0n

4. Wait for resources to be allocated. The first clone will take about 2 minutes. When the status changes to "Running", click the jump arrow next to "API Address" to jump to the Demo page. Please note that users must complete real-name authentication before using the API address access function.

Effect Demonstration

Click the API address to enter the demo page to experience the model. Once on the model page, upload a PDF Document / Image Document / Multiple Files and click "Process" to generate the corresponding Markdown text. The effect is as follows:

OCRFlux-3B can also easily identify the tables and formulas interspersed in the paper:

This is the recommended tutorial for this issue. Welcome everyone to try it out for yourself⬇️

Tutorial Link:

https://go.hyper.ai/0K2OY