AI Paper Weekly Report | General Agent Development / Object Detection / Open Source Physics Inference Models... Get a Glimpse Into the Latest AI Developments in One article.

3 months ago

In recent years, the development of Large Language Models (LLMs) has propelled the research frontier from puzzle-solving tasks to scientific reasoning—that is, the ability to handle complex problems where answers must be tested against natural laws, not just scoring criteria. Physics is the most rigorous measure of this shift because it fundamentally connects symbolic systems to the real world and is the cornerstone of most modern technologies.

Based on this, a research team from the Shanghai Artificial Intelligence Laboratory has successfully advanced physics research by developing large-scale language models with outstanding physical reasoning capabilities, particularly excelling in solving Olympiad-level problems. The researchers proposed the P1 series of open-source physical reasoning models, which are trained entirely through reinforcement learning (RL). Among them, P1-235B-A22B is the first open-source model to achieve gold medal-level performance in the 2025 International Physics Olympiad (IPhO 2025), and it won 12 gold medals in 13 international and regional physics competitions from 2024 to 2025.

Paper link:https://go.hyper.ai/NxT8f

Latest AI Papers:https://go.hyper.ai/hzChC

In order to let more users know the latest developments in the field of artificial intelligence in academia, HyperAI's official website (hyper.ai) has now launched a "Latest Papers" section, which updates cutting-edge AI research papers every day.Here are 5 popular AI papers we recommend, let’s take a quick look at this week’s cutting-edge AI achievements⬇️

This week's paper recommendation

1. Lumine: An Open Recipe for Building Generalist Agents in 3D Open World

This paper proposes Lumine, the first open-source general-purpose agent development solution capable of executing complex tasks for hours in real-time in complex 3D open-world environments. Lumine adopts a human-like interaction paradigm, unifying perception, reasoning, and action in an end-to-end manner through a vision-language model. It processes raw pixel input at a frequency of 5 frames per second, generates precise keyboard and mouse operations at 30 frames per second, and dynamically invokes the inference module only when necessary.

Paper link:https://go.hyper.ai/wfGhN

2. YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception

This paper proposes YOLOv13, a high-precision and lightweight object detector. Researchers also propose a hypergraph-based adaptive correlation enhancement mechanism (HyperACE), which adaptively mines potential higher-order correlations, overcoming the limitations of previous methods that were limited to pairwise correlation modeling based on hypergraph computation. This mechanism achieves efficient global cross-location and cross-scale feature fusion and enhancement.

Paper link:https://go.hyper.ai/cKMGI

3. Generating an Image From 1,000 Words Enhancing Text-to-Image With Structured Captions

This paper presents the first open-source text-to-image model, FIBO, based on long structured descriptions, where each training sample is labeled with the same set of fine-grained attributes. This design significantly expands expressive power and achieves decoupled control over visual factors. To efficiently handle long descriptions, the researchers propose the DimFusion mechanism—a fusion method that can fuse intermediate tokens from a lightweight large language model (LLM) without increasing token length.

Paper link:https://go.hyper.ai/zyUcE

4. Depth Anything 3: Recovering the Visual Space from Any Views

This paper proposes Depth Anything 3 (DA3), a model capable of predicting spatially consistent geometry from any number of visual inputs, regardless of whether the inputs contain known camera poses. Researchers constructed a novel visual geometry benchmark covering camera pose estimation, arbitrary viewpoint geometry reconstruction, and visual rendering tasks. On this benchmark, DA3 achieves new state-of-the-art performance across all tasks, with an average improvement of 44.31 TP3T in camera pose estimation accuracy and an average improvement of 25.11 TP3T in geometry reconstruction accuracy compared to the previous state-of-the-art method, VGGT.

Paper link:https://go.hyper.ai/WvSU4

5. P1: Mastering Physics Olympiads with Reinforcement Learning

This paper successfully advances physics research by developing large-scale language models with superior physics reasoning capabilities, particularly excelling in solving Olympiad-level problems. We propose the P1 series of open-source physics reasoning models, which are trained entirely through reinforcement learning (RL).

Paper link:https://go.hyper.ai/NxT8f

The above is all the content of this week’s paper recommendation. For more cutting-edge AI research papers, please visit the “Latest Papers” section of hyper.ai’s official website.

We also welcome research teams to submit high-quality results and papers to us. Those interested can add the NeuroStar WeChat (WeChat ID: Hyperai01).

See you next week!

AI Paper Weekly Report | General Agent Development / Object Detection / Open Source Physics Inference Models... Get a Glimpse Into the Latest AI Developments in One article.

3 months ago

Information

Agent

Artificial Intelligence

Machine Learning

Deep Learning

Object Detection

Paper link:https://go.hyper.ai/NxT8f

Latest AI Papers:https://go.hyper.ai/hzChC

This week's paper recommendation

1. Lumine: An Open Recipe for Building Generalist Agents in 3D Open World

Paper link:https://go.hyper.ai/wfGhN

2. YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception

Paper link:https://go.hyper.ai/cKMGI

3. Generating an Image From 1,000 Words Enhancing Text-to-Image With Structured Captions

Paper link:https://go.hyper.ai/zyUcE

4. Depth Anything 3: Recovering the Visual Space from Any Views

Paper link:https://go.hyper.ai/WvSU4

5. P1: Mastering Physics Olympiads with Reinforcement Learning

Paper link:https://go.hyper.ai/NxT8f

The above is all the content of this week’s paper recommendation. For more cutting-edge AI research papers, please visit the “Latest Papers” section of hyper.ai’s official website.

We also welcome research teams to submit high-quality results and papers to us. Those interested can add the NeuroStar WeChat (WeChat ID: Hyperai01).

See you next week!

Command Palette

AI Paper Weekly Report | General Agent Development / Object Detection / Open Source Physics Inference Models... Get a Glimpse Into the Latest AI Developments in One article.

This week's paper recommendation

Command Palette

AI Paper Weekly Report | General Agent Development / Object Detection / Open Source Physics Inference Models... Get a Glimpse Into the Latest AI Developments in One article.

This week's paper recommendation

Related News

AI Paper Weekly Report | Latest Developments in AI Agents: PaperBanana/Lumine/Insight Agents... A Comprehensive Technical Overview

AI Paper Weekly Report | De Novo Protein Design / First open-source Agent Solution / HunyuanOCR / Olmo 3 Language model... One-click Overview

AI Paper Weekly Roundup | Attention Mechanism / NVIDIA VLA Model / TTS Model / Graph Neural Networks... A Comprehensive Overview of the Latest AI Developments

GPT-5 Leads Across the Board; OpenAI Releases FrontierScience, Using a Dual Approach of "inference + Research" to Test the Capabilities of large-scale models.

Baidu Makes a Move! Its OCR Model, PaddleOCR-VL, Breaks Through the Limitations of Pipeline and end-to-end Methods; the Facial Emotion Recognition Dataset Empowers AI to Understand Facial expressions.

AI Paper Weekly Report | NVIDIA Open Source Models / OpenAI Benchmarks / Agent Systems / Long Context Inference... A Quick Roundup of AI Updates

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

LightOnOCR-2-1B: High-precision end-to-end OCR Based on RLVR Training; Google Streetview National Street View Images: An open-source Panoramic Image Library Based on world-class Geomapping technology.

Command Palette

AI Paper Weekly Report | General Agent Development / Object Detection / Open Source Physics Inference Models... Get a Glimpse Into the Latest AI Developments in One article.

This week's paper recommendation

Related News

AI Paper Weekly Report | Latest Developments in AI Agents: PaperBanana/Lumine/Insight Agents... A Comprehensive Technical Overview

AI Paper Weekly Report | De Novo Protein Design / First open-source Agent Solution / HunyuanOCR / Olmo 3 Language model... One-click Overview

AI Paper Weekly Roundup | Attention Mechanism / NVIDIA VLA Model / TTS Model / Graph Neural Networks... A Comprehensive Overview of the Latest AI Developments

GPT-5 Leads Across the Board; OpenAI Releases FrontierScience, Using a Dual Approach of "inference + Research" to Test the Capabilities of large-scale models.

Baidu Makes a Move! Its OCR Model, PaddleOCR-VL, Breaks Through the Limitations of Pipeline and end-to-end Methods; the Facial Emotion Recognition Dataset Empowers AI to Understand Facial expressions.

AI Paper Weekly Report | NVIDIA Open Source Models / OpenAI Benchmarks / Agent Systems / Long Context Inference... A Quick Roundup of AI Updates

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

LightOnOCR-2-1B: High-precision end-to-end OCR Based on RLVR Training; Google Streetview National Street View Images: An open-source Panoramic Image Library Based on world-class Geomapping technology.

Related News

AI Paper Weekly Report | Latest Developments in AI Agents: PaperBanana/Lumine/Insight Agents... A Comprehensive Technical Overview

AI Paper Weekly Report | De Novo Protein Design / First open-source Agent Solution / HunyuanOCR / Olmo 3 Language model... One-click Overview

AI Paper Weekly Roundup | Attention Mechanism / NVIDIA VLA Model / TTS Model / Graph Neural Networks... A Comprehensive Overview of the Latest AI Developments

GPT-5 Leads Across the Board; OpenAI Releases FrontierScience, Using a Dual Approach of "inference + Research" to Test the Capabilities of large-scale models.

Baidu Makes a Move! Its OCR Model, PaddleOCR-VL, Breaks Through the Limitations of Pipeline and end-to-end Methods; the Facial Emotion Recognition Dataset Empowers AI to Understand Facial expressions.

AI Paper Weekly Report | NVIDIA Open Source Models / OpenAI Benchmarks / Agent Systems / Long Context Inference... A Quick Roundup of AI Updates

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

LightOnOCR-2-1B: High-precision end-to-end OCR Based on RLVR Training; Google Streetview National Street View Images: An open-source Panoramic Image Library Based on world-class Geomapping technology.

Related News

AI Paper Weekly Report | Latest Developments in AI Agents: PaperBanana/Lumine/Insight Agents... A Comprehensive Technical Overview

AI Paper Weekly Report | De Novo Protein Design / First open-source Agent Solution / HunyuanOCR / Olmo 3 Language model... One-click Overview

AI Paper Weekly Roundup | Attention Mechanism / NVIDIA VLA Model / TTS Model / Graph Neural Networks... A Comprehensive Overview of the Latest AI Developments

GPT-5 Leads Across the Board; OpenAI Releases FrontierScience, Using a Dual Approach of "inference + Research" to Test the Capabilities of large-scale models.

Baidu Makes a Move! Its OCR Model, PaddleOCR-VL, Breaks Through the Limitations of Pipeline and end-to-end Methods; the Facial Emotion Recognition Dataset Empowers AI to Understand Facial expressions.

AI Paper Weekly Report | NVIDIA Open Source Models / OpenAI Benchmarks / Agent Systems / Long Context Inference... A Quick Roundup of AI Updates

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

LightOnOCR-2-1B: High-precision end-to-end OCR Based on RLVR Training; Google Streetview National Street View Images: An open-source Panoramic Image Library Based on world-class Geomapping technology.