AI Paper Weekly Report | The Largest Medical Reasoning Dataset ReasonMed Contains 370,000 Samples; Microsoft/Peking University/Tsinghua University Proposed Reinforcement Learning Pre-training to Improve the Accuracy of Next Token Prediction

As AI technology develops rapidly, academic research results and research papers emerge in an endless stream. According to the White Paper on Scientific Intelligence 2025, the number of global AI journal papers has more than tripled in the past decade, from 308,900 to 954,500. Behind these huge data, there is not only the wisdom of researchers, but also their marks of overcoming difficulties and exploring the future.
From the Transformer architecture that made a breakthrough in language models to the Diffusion model that redefined the possibilities of image generation; from the deep application of reinforcement learning in the field of autonomous driving to the significant progress in AI-assisted medical diagnosis... Every leap forward in the field of artificial intelligence began with a series of papers that condensed wisdom. It is these valuable academic treasures that have together woven into the "technical map" that drives the development of the industry.
In order to let more users know the latest developments in the field of artificial intelligence in academia, HyperAI's official website (hyper.ai) has launched the "Latest Papers" section, which updates AI cutting-edge research papers every day, covering multiple vertical fields such as machine learning, computational language, computer vision and pattern recognition, and human-computer interaction. Come and have a look~
Latest AI Papers:https://go.hyper.ai/owxf6
Below, HyperAI has carefully selected 5 popular AI papers updated from June 9 to 13. Let’s learn together~
This week's paper recommendation
1 Reinforcement Pre-Training
This study proposes a new language model pre-training method, reinforced pre-training (RPT), which converts the next word prediction task into an inference task and uses reinforcement learning for training to encourage the model to correctly predict the next word based on a given context. Experimental results show that RPT can not only significantly improve the prediction accuracy of the language model, but also provide a stronger foundation for subsequent RL fine-tuning, thereby improving performance on zero-shot transfer learning tasks.
Paper link:https://go.hyper.ai/Pxpgk


2 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
This study introduces a method for reinforcement learning with large language models (LLMs), namely unsupervised model fine-tuning through self-confidence (RLSC). Experimental results show that with a small number of samples (16 samples per question and 10 or 20 training steps), this method can significantly improve the accuracy of the model on multiple mathematical reasoning tasks.
Paper link:https://go.hyper.ai/rFuVl


3 Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
This study proposes a new method to evaluate and improve the performance of large language models (LLMs) in question-answering tasks, with a special focus on the time sensitivity of questions or whether they are "timeless" questions. The study shows that EG-E5's ability to judge timeless questions is ahead of all tested models. Further research shows that when using uncertainty indicators to evaluate the knowledge of LLMs, combining the probabilistic information of "timeless" questions can significantly improve the quality and accuracy of the evaluation.
Paper link:https://go.hyper.ai/zOGjT


4 ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
This study introduces a large-scale medical reasoning (ReasonMed) dataset generated by a multi-agent system, which aims to improve the ability of medical question answering based on language models. The dataset is generated by approximately 1.7 million initial reasoning paths through different large language models, and after rigorous verification and optimization, it is finally refined into 370,000 high-quality examples. The article also explores the impact of multiple training strategies on the performance of medical reasoning models, and finds that a hybrid method combining detailed chained thinking (CoT) reasoning with concise answer summarization is the most effective.
Paper link:https://go.hyper.ai/XyO0s


5 UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules
This study introduces a new deep learning model, the unified simulator (UniSim), which aims to enhance the understanding of the atomic-level behavior of molecular systems through cross-domain knowledge and achieve efficient long-term dynamics simulation. Experimental results show that UniSim has demonstrated highly competitive performance in multiple fields such as small molecules, peptide chains, and proteins, especially in transfer learning capabilities and long-range dynamics simulation.
Paper link:https://go.hyper.ai/0Eqsu


The above is all the content of this week's paper recommendation. For more cutting-edge AI research papers, please see hyper.ai「Latest Paper」Plate.
We also welcome research teams to submit high-quality results and papers to us. Those interested can add the NeuroStar WeChat (WeChat ID: Hyperai01).
See you next week!