Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Show-o2: Improved Native Unified Multimodal Models

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective































Show-o2: Improved Native Unified Multimodal Models

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective






























Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models
EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection
s1: Simple test-time scaling
Search-o1: Agentic Search-Enhanced Large Reasoning Models
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
GPT-4o System Card
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
Aria: An Open Multimodal Native Mixture-of-Experts Model
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
VGGT: Visual Geometry Grounded Transformer
Multi-Turn Code Generation Through Single-Step Rewards
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence
Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
BUT System for the MLC-SLM Challenge
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
Sekai: A Video Dataset towards World Exploration
Data-driven material screening of secondary and natural cementitious precursors
QFFT, Question-Free Fine-Tuning for Adaptive Reasoning
Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Efficient Medical VIE via Reinforcement Learning
Scaling Test-time Compute for LLM Agents
Iterative transcription factor screening enables rapid generation of microglia-like cells from human iPSC
TaskCraft: Automated Generation of Agentic Tasks
Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models
EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection
s1: Simple test-time scaling
Search-o1: Agentic Search-Enhanced Large Reasoning Models
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
GPT-4o System Card
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
Aria: An Open Multimodal Native Mixture-of-Experts Model
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
VGGT: Visual Geometry Grounded Transformer
Multi-Turn Code Generation Through Single-Step Rewards
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence
Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
BUT System for the MLC-SLM Challenge
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
Sekai: A Video Dataset towards World Exploration
Data-driven material screening of secondary and natural cementitious precursors
QFFT, Question-Free Fine-Tuning for Adaptive Reasoning
Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Efficient Medical VIE via Reinforcement Learning
Scaling Test-time Compute for LLM Agents
Iterative transcription factor screening enables rapid generation of microglia-like cells from human iPSC
TaskCraft: Automated Generation of Agentic Tasks