5.2k Stars! A Super Innovator That Breaks Through the OCR Dilemma Is Here; Multilingual Medical Large Model Is Open Source, Corpus and Benchmark Datasets Are Available for Download

In today's era of rapid digital development, although OCR (optical character recognition) technology has become popular, there are still many bottlenecks. The recognition accuracy of traditional OCR models will be greatly reduced when faced with complex and changing situations, and the processing flow and operation steps are quite cumbersome, which greatly reduces work efficiency.
The world's first universal end-to-end OCR model GOT-OCR-2.0 has been officially open sourced recently! It solves the limitations of traditional OCR in terms of poor image quality, complex background, and handwritten text recognition.The model now provides a Demo tutorial on the hyper.ai official website. You can skip the complicated installation steps and clone it directly to start~
Run online:https://go.hyper.ai/JVVKQ
From October 1st to October 12th, hyper.ai official website updates:
* Selection of high-quality tutorials: 3
* High-quality public datasets: 10
* Community article selection: 5 articles
* Popular encyclopedia entries: 5
* Top conferences with deadline in October: 5
Visit the official website:hyper.ai
Selected Public Tutorials
1. GOT-OCR-2.0 The world's first universal end-to-end OCR model
GOT-OCR-2.0 is a unified end-to-end model based on General OCR Theory, focusing on improving the accuracy and efficiency of optical character recognition (OCR). It adopts an integrated architecture that can efficiently handle the diversity and complexity of text. GOT-OCR 2.0 not only supports scene text recognition, but also can process multi-page documents, bringing more flexibility to the OCR field. Run the container according to the tutorial and directly copy the API address to experience the inference of the model.
Direct use:https://go.hyper.ai/JVVKQ

2. IC-Light image lighting tool, natural background fusion replacement
IC-Light stands for Imposing Consistent Light, which aims to achieve image relighting through machine learning models. It provides two main models: text-conditional lighting model and background-conditional model, which adjust the lighting of foreground images according to text cues or background content respectively.
This project can generate a front-end interactive interface through the Gradio interface. The relevant models and dependencies have been deployed and can be started with one click.
Direct use:https://go.hyper.ai/1Y0PQ

3. Fish Speech v1.4 Voice Cloning-Text to Speech Tool Demo
Fish Speech is a text-to-speech (TTS) model developed by Fish Audio in 2024 that generates high-quality, natural speech. After being upgraded to version 1.4, this model has been trained with approximately 700,000 hours of data and is proficient in eight languages including Chinese, Japanese, and English. Its language processing capabilities are close to human levels, and its sound expressions are rich and varied.
This tutorial has updated the model to the latest version and deployed the environment. You can directly perform voice cloning or text-to-speech tasks according to the tutorial instructions.
Direct use:https://go.hyper.ai/t7O8m
Selected public datasets
1. MMedC Large-Scale Multilingual Medical Corpus
The dataset contains approximately 25.5 billion tokens of medical prediction data, covering 6 major languages: English, Chinese, Japanese, French, Russian and Spanish, and support for more languages is still being updated and expanded.
Direct use:https://go.hyper.ai/jXv0r

2. MMedBench Multilingual Medical Ability Test Benchmark Dataset
This dataset is designed to evaluate the development of multilingual models in the medical field, covering 6 languages and 21 medical subfields. All questions in MMedBench are directly derived from medical examination question banks in various countries, ensuring the accuracy and reliability of the evaluation and avoiding diagnostic understanding bias caused by differences in medical practice guidelines in different countries.
Direct use:https://go.hyper.ai/8X9xD

3. Lacuna Malaria Detection Dataset
The dataset contains a total of 3,925 malaria slide images, including 2,747 images in the training set and 1,178 images in the test set. In addition to the images, the slide on which the image was captured, the microscope stage micrometer reading, and the objective lens settings are also recorded, with a maximum of 40 images captured per slide.
Direct use:https://go.hyper.ai/9oBFv

4. HelpSteer2 Human Preference Alignment Dataset
HelpSteer2 contains about 10,000 answer pairs, which is an order of magnitude smaller than existing preference datasets, but it is very efficient in training reward models that can guide large language models (LLMs) to generate high-quality answers that match human preferences.
Direct use:https://go.hyper.ai/YePhv
5. MMMLU Multi-language Multi-task Language Understanding Dataset
The dataset is designed to evaluate and improve the performance of AI models in different linguistic, cognitive and cultural contexts. MMMLU is built on the Large-Scale Multi-Task Language Understanding (MMLU) benchmark, which is a common sense indicator achieved by AI models. It contains 57 tasks in different subject areas, ranging from elementary knowledge to advanced professional disciplines such as law, physics, history and computer science.
Direct use:https://go.hyper.ai/TY7aR
6. FRAMES-benchmark retrieval enhancement generation test set
The dataset contains 824 challenging multi-hop questions that require information from 2 to 15 Wikipedia articles. The questions cover a variety of topics such as history, sports, science, animals, health, etc., and each question is labeled with the reasoning type, such as numerical, table, multiple constraints, temporal, and post-processing. The dataset also provides the golden answer and the related Wikipedia article for each question.
Direct use:https://go.hyper.ai/zp5WQ
7. MedScribble Multi-image Segmentation Biomedical Task Dataset
The dataset contains handwritten scribbles from 3 annotators collected by the research team, completing 14 segmentation tasks from 14 different open access biomedical image segmentation datasets. MedScrible contains a total of 64 2D image segmentation pairs, each with 3 sets of scribbled annotations.
Direct use:https://go.hyper.ai/X901T
8. CDFSOD-benchmark Cross-domain small sample object detection benchmark dataset
This project aims to solve the problem of small sample object detection when there is a significant domain difference between the source domain and the target domain. It includes a dataset for algorithm evaluation, as well as dataset indicators such as style, inter-class variance (ICV), and indefinable boundaries (IB) for measuring domain differences.
Direct use:https://go.hyper.ai/YQsnW
9. CLVR Jaco Play Dataset Remote Control Robot Clip Dataset
This dataset is a valuable resource for scientists and developers working in the fields of robotic teleoperation, natural language processing, and human-robot interaction. It provides 1,085 clips of the Jaco 2 teleoperated robot with corresponding language annotations.
Direct use:https://go.hyper.ai/Xde69
10. Berkeley Cable Routing Multi-stage Robotic Cable Task Dataset
The Berkeley Cable Routing dataset is a dataset for studying multi-stage robotic manipulation tasks, specifically applied to cable routing tasks. The task requires the robot to route a cable through a series of clips, which represents the challenges of complex multi-stage robotic manipulation scenarios, including handling deformable objects, closing visual perception loops, and handling extended behaviors consisting of multiple steps.
Direct use:https://go.hyper.ai/aiML0
For more public datasets, please visit:
Community Articles
In the third episode of the "Meet AI4S" live series, Zhou Ziyi, a postdoctoral fellow in Professor Hong Liang's research group at the Institute of Natural Sciences of Shanghai Jiao Tong University, shared the team's latest research results on the topic of "Small Sample Learning Methods for Protein Language Models" and explored new ideas for AI-assisted directed evolution. This article is a transcript of his speech, which is full of practical information.
View the full summary:https://go.hyper.ai/MzXfg
2. Jeff Dean likes Google's new research: Whale bioacoustic model can identify 8 types of whales
The Google Research team has developed a new whale bioacoustic model that can identify 8 different species out of the 94 currently known whale species. This article is a detailed interpretation and sharing of the paper.
View the full report:https://go.hyper.ai/1l2HO
The team of Professor Wu Mengyue from the X-LANCE Laboratory of Shanghai Jiao Tong University, in collaboration with the Tianqiao Institute for Brain Science and ThetaAI, built an automated large-model dialogue agent simulation system, the Intelligent Psychological Clinic AMC, for the preliminary diagnosis of depression. This article is a detailed interpretation and sharing of the research paper.
View the full report:https://go.hyper.ai/AdjI5
The research group led by Zheng Shuangjia from Shanghai Jiao Tong University, in collaboration with Star Pharma, Sun Yat-sen University School of Pharmacy, and Rice University, proposed a geometric deep generative model DynamicBind designed for protein dynamic docking, which provides a new research paradigm based on deep learning and considering the dynamic changes of proteins for drug development in the post-AlphaFold era. This article is a detailed interpretation and sharing of the research paper.
View the full report:https://go.hyper.ai/nErwd
David Baker, Demis Hassabis and John M. Jumper won the 2024 Nobel Prize in Chemistry. DeepMind CEO Demis Hassabis said, "The best scientists working with these AI tools will be able to do incredible work." David Baker even said, "AlphaFold is very inspiring." This article is a detailed report on the winners of this year's Nobel Prize in Chemistry.
View the full report:https://go.hyper.ai/UPpuB
Popular Encyclopedia Articles
1. Transformer Model
2. Variational Autoencoder VAE
3. Artificial Neural Networks
4. Pareto Front
5. Large-scale Multi-task Language Understanding (MMLU)
Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

One-stop tracking of top AI academic conferences:https://go.hyper.ai/event
The above is all the content of this week’s editor’s selection. If you have resources that you want to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!
See you next week!
About HyperAI
HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:
* Provide domestic accelerated download nodes for 1300+ public data sets
* Includes 400+ classic and popular online tutorials
* Interpretation of 100+ AI4Science paper cases
* Support 500+ related terms search
* Hosting the first complete Apache TVM Chinese documentation in China
Visit the official website to start your learning journey: