GOT-OCR-2.0 The World's First Universal End-to-end OCR Model

Project Introduction
GOT-OCR-2.0
It is a unified end-to-end model based on General OCR Theory, focusing on improving the accuracy and efficiency of optical character recognition (OCR). The project was jointly released by StepFun, Megvii Technology, University of Chinese Academy of Sciences and Tsinghua University. The related paper results are "General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model", suitable for a variety of application scenarios such as scene text and document recognition. It adopts an integrated architecture that can efficiently handle the diversity and complexity of text. GOT-OCR 2.0 not only supports scene text recognition, but also can process multi-page documents, bringing more flexibility to the OCR field.
GOT-OCR-2.0
Features include:
- Strong versatility: Based on general OCR theory, it can process scene text and complex document structures such as tables and formulas.
- End-to-end model: The unified end-to-end architecture simplifies the entire OCR process, integrating image input to text output.
- Efficient performance: Integrated Flash-Attention technology improves recognition speed and performance.
- Multi-platform support: supports CUDA acceleration and is integrated with the GOT-OCR2.0 platform to load pre-trained models.
- Widely used: Suitable for a wide range of application scenarios such as multi-page documents and scene texts.
Effect examples
![]() |
![]() |
Run steps
1. Click "Clone" in the upper right corner of the project, and then click "Next" to complete: Basic Information > Select Computing Power > Review. Finally, click "Continue" to open this project in your personal container.
2. After the resource allocation is completed, the background will automatically initialize the model (), and then you can directly use the API address provided by the platform to access the operation page (real-name authentication must have been completed, and there is no need to open the workspace for this step)

3. Upload the target image
