AI Weekly Paper Report: Latest Research From Alibaba, Xiamen University, Zhejiang University, and More, Covering Reinforcement Learning Optimization Algorithms, GUI Agents, Multimodal Context Compression, and More

7 months ago

As large-scale language models continue to scale, efficient and stable reinforcement learning training becomes a key challenge. To address this, Alibaba Group's Qwen team proposed a novel reinforcement learning algorithm, Group Sequence Policy Optimization (GSPO).

Unlike traditional methods that rely on token-level importance ratios, GSPO defines importance ratios based on sequence probabilities and performs truncation, rewards, and optimization at the sequence level, significantly improving training stability and efficiency. GSPO performs exceptionally well within the Mixture-of-Experts architecture, simplifying the design of reinforcement learning infrastructure and significantly improving the performance of the latest Qwen3 model.

Paper link:https://go.hyper.ai/FOrdj

Latest AI Papers:https://go.hyper.ai/hzChC

In order to let more users know the latest developments in the field of artificial intelligence in academia, HyperAI's official website (hyper.ai) has now launched a "Latest Papers" section, which updates cutting-edge AI research papers every day.Here are 5 popular AI papers we recommend, let’s take a quick look at this week’s cutting-edge AI achievements⬇️

This week's paper recommendation

1 Group Sequence Policy Optimization

This paper introduces Group Sequence Policy Optimization (GSPO), a stable, efficient, and high-performance reinforcement learning algorithm for training large language models. Unlike previous algorithms that use token importance ratios, GSPO defines importance ratios based on sequence likelihood and performs sequence-level pruning, rewards, and optimization.

Paper link:https://go.hyper.ai/FOrdj

2 UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Existing GUI agent training and inference methods still face challenges such as inference design difficulties, ineffective reward mechanisms, and visual noise interference. This paper proposes a novel method—selective decomposition alignment—which significantly improves alignment accuracy on high-resolution interfaces by dividing the image into smaller, more manageable parts. Experimental results demonstrate that UI-AGILE achieves state-of-the-art performance on two benchmark tasks: ScreenSpot-Pro and ScreenSpot-v2.

Paper link:https://go.hyper.ai/SRpdE

3 When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

This paper presents the first systematic review and summary of the rapidly developing research area of multimodal long-context token compression. Given the unique characteristics and redundancy of different modalities, researchers have categorized existing methods by the type of data they primarily address, enabling quick access to methods applicable to specific research areas: image-centric compression, video-centric compression, and audio-centric compression.

Paper link:https://go.hyper.ai/nOYw4

4 SciToolAgent: A Knowledge Graph-Driven Scientific Agent for Multi-Tool Integration

This paper presents SciToolAgent, an agent powered by the LLM that automates the operation of hundreds of scientific research tools across biology, chemistry, and materials science. At its core, SciToolAgent is a scientific tool knowledge graph that leverages a graph-based Retrieval-Augmented Generation (RAG) mechanism to enable intelligent tool selection and execution. The system also integrates a comprehensive safety check module to ensure responsible and ethical tool use.

Paper link:https://go.hyper.ai/IOiRk

5 SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

This paper presents SmallThinker, a set of LLMs designed natively for local devices (rather than compressed from cloud models). They are specifically tailored to address the unique limitations of local devices: weak computing power, limited memory, and slow storage. SmallThinker is architecturally redesigned to operate efficiently in constrained environments. At its core, it features an innovative "deployment-oriented" architecture that translates system constraints into design principles.

Paper link:https://go.hyper.ai/tSwpG

The above is all the content of this week’s paper recommendation. For more cutting-edge AI research papers, please visit the “Latest Papers” section of hyper.ai’s official website.

We also welcome research teams to submit high-quality results and papers to us. Those interested can add the NeuroStar WeChat (WeChat ID: Hyperai01).

See you next week!

AI Weekly Paper Report: Latest Research From Alibaba, Xiamen University, Zhejiang University, and More, Covering Reinforcement Learning Optimization Algorithms, GUI Agents, Multimodal Context Compression, and More

7 months ago

Information

Agent

Multimodal

Reinforcement Learning

Algorithm

Paper link:https://go.hyper.ai/FOrdj

Latest AI Papers:https://go.hyper.ai/hzChC

This week's paper recommendation

1 Group Sequence Policy Optimization

Paper link:https://go.hyper.ai/FOrdj

2 UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Paper link:https://go.hyper.ai/SRpdE

3 When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

Paper link:https://go.hyper.ai/nOYw4

4 SciToolAgent: A Knowledge Graph-Driven Scientific Agent for Multi-Tool Integration

Paper link:https://go.hyper.ai/IOiRk

5 SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

Paper link:https://go.hyper.ai/tSwpG

The above is all the content of this week’s paper recommendation. For more cutting-edge AI research papers, please visit the “Latest Papers” section of hyper.ai’s official website.

We also welcome research teams to submit high-quality results and papers to us. Those interested can add the NeuroStar WeChat (WeChat ID: Hyperai01).

See you next week!

Command Palette

AI Weekly Paper Report: Latest Research From Alibaba, Xiamen University, Zhejiang University, and More, Covering Reinforcement Learning Optimization Algorithms, GUI Agents, Multimodal Context Compression, and More

This week's paper recommendation

Command Palette

AI Weekly Paper Report: Latest Research From Alibaba, Xiamen University, Zhejiang University, and More, Covering Reinforcement Learning Optimization Algorithms, GUI Agents, Multimodal Context Compression, and More

This week's paper recommendation

Related News

Paper Compilation | Over 100 Key AI for Science Achievements: A Quick Overview of Technological Innovations by 2025

AI Paper Weekly Report | NVIDIA Open Source Models / OpenAI Benchmarks / Agent Systems / Long Context Inference... A Quick Roundup of AI Updates

AI Paper Weekly Report | Latest Developments in AI Agents: PaperBanana/Lumine/Insight Agents... A Comprehensive Technical Overview

AI Paper Weekly Report | Cutting-Edge OCR Technology Interpretation: DeepSeek, Tencent, and Baidu Compete on the Same Stage, From Character Recognition to Structured Document Parsing

LightOnOCR-2-1B: High-precision end-to-end OCR Based on RLVR Training; Google Streetview National Street View Images: An open-source Panoramic Image Library Based on world-class Geomapping technology.

TRELLIS.2: Employs O-Voxel Technology for Efficient Generation of Complex 3D Geometry and Materials; Patient Churn Prediction Dataset: Helps Identify Patients at Risk of attrition.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Command Palette

AI Weekly Paper Report: Latest Research From Alibaba, Xiamen University, Zhejiang University, and More, Covering Reinforcement Learning Optimization Algorithms, GUI Agents, Multimodal Context Compression, and More

This week's paper recommendation

Related News

Paper Compilation | Over 100 Key AI for Science Achievements: A Quick Overview of Technological Innovations by 2025

AI Paper Weekly Report | NVIDIA Open Source Models / OpenAI Benchmarks / Agent Systems / Long Context Inference... A Quick Roundup of AI Updates

AI Paper Weekly Report | Latest Developments in AI Agents: PaperBanana/Lumine/Insight Agents... A Comprehensive Technical Overview

AI Paper Weekly Report | Cutting-Edge OCR Technology Interpretation: DeepSeek, Tencent, and Baidu Compete on the Same Stage, From Character Recognition to Structured Document Parsing

LightOnOCR-2-1B: High-precision end-to-end OCR Based on RLVR Training; Google Streetview National Street View Images: An open-source Panoramic Image Library Based on world-class Geomapping technology.

TRELLIS.2: Employs O-Voxel Technology for Efficient Generation of Complex 3D Geometry and Materials; Patient Churn Prediction Dataset: Helps Identify Patients at Risk of attrition.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Related News

Paper Compilation | Over 100 Key AI for Science Achievements: A Quick Overview of Technological Innovations by 2025

AI Paper Weekly Report | NVIDIA Open Source Models / OpenAI Benchmarks / Agent Systems / Long Context Inference... A Quick Roundup of AI Updates

AI Paper Weekly Report | Latest Developments in AI Agents: PaperBanana/Lumine/Insight Agents... A Comprehensive Technical Overview

AI Paper Weekly Report | Cutting-Edge OCR Technology Interpretation: DeepSeek, Tencent, and Baidu Compete on the Same Stage, From Character Recognition to Structured Document Parsing

LightOnOCR-2-1B: High-precision end-to-end OCR Based on RLVR Training; Google Streetview National Street View Images: An open-source Panoramic Image Library Based on world-class Geomapping technology.

TRELLIS.2: Employs O-Voxel Technology for Efficient Generation of Complex 3D Geometry and Materials; Patient Churn Prediction Dataset: Helps Identify Patients at Risk of attrition.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Related News

Paper Compilation | Over 100 Key AI for Science Achievements: A Quick Overview of Technological Innovations by 2025

AI Paper Weekly Report | NVIDIA Open Source Models / OpenAI Benchmarks / Agent Systems / Long Context Inference... A Quick Roundup of AI Updates

AI Paper Weekly Report | Latest Developments in AI Agents: PaperBanana/Lumine/Insight Agents... A Comprehensive Technical Overview

AI Paper Weekly Report | Cutting-Edge OCR Technology Interpretation: DeepSeek, Tencent, and Baidu Compete on the Same Stage, From Character Recognition to Structured Document Parsing

LightOnOCR-2-1B: High-precision end-to-end OCR Based on RLVR Training; Google Streetview National Street View Images: An open-source Panoramic Image Library Based on world-class Geomapping technology.

TRELLIS.2: Employs O-Voxel Technology for Efficient Generation of Complex 3D Geometry and Materials; Patient Churn Prediction Dataset: Helps Identify Patients at Risk of attrition.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.