a month ago

TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs

Yunheng Li Jing Cheng Shaoyong Jia Hangyi Kuang Shaohui Jiao Qibin Hou Ming-Ming Cheng

Abstract

This paper introduces TempSamp-R1, a new reinforcement fine-tuning frameworkdesigned to improve the effectiveness of adapting multimodal large languagemodels (MLLMs) to video temporal grounding tasks. We reveal that existingreinforcement learning methods, such as Group Relative Policy Optimization(GRPO), rely on on-policy sampling for policy updates. However, in tasks withlarge temporal search spaces, this strategy becomes both inefficient andlimited in performance, as it often fails to identify temporally accuratesolutions. To address this limitation, TempSamp-R1 leverages ground-truthannotations as off-policy supervision to provide temporally precise guidance,effectively compensating for the sparsity and misalignment in on-policysolutions. To further stabilize training and reduce variance in reward-basedupdates, TempSamp-R1 provides a non-linear soft advantage computation methodthat dynamically reshapes the reward feedback via an asymmetric transformation.By employing a hybrid Chain-of-Thought (CoT) training paradigm, TempSamp-R1optimizes a single unified model to support both CoT and non-CoT inferencemodes, enabling efficient handling of queries with varying reasoningcomplexity. Experimental results demonstrate that TempSamp-R1 outperformsGRPO-based baselines, establishing new state-of-the-art performance onbenchmark datasets: Charades-STA (R1@0.7: 52.9%, +2.7%), ActivityNet Captions(R1@0.5: 56.0%, +5.3%), and QVHighlights (mAP: 30.0%, +3.0%). Moreover,TempSamp-R1 shows robust few-shot generalization capabilities under limiteddata. Code: https://github.com/HVision-NKU/TempSamp-R1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs

Yunheng Li Jing Cheng Shaoyong Jia Hangyi Kuang Shaohui Jiao Qibin Hou Ming-Ming Cheng

Abstract

Build AI with AI

Hyper Newsletters