Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length































DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length






























F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach
OneThinker: All-in-one Reasoning Model for Image and Video
ViDiC: Video Difference Captioning
PretrainZero: Reinforcement Active Pretraining
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
SimScale: Learning to Drive via Real-World Simulation at Scale
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
Guided Self-Evolving LLMs with Minimal Human Supervision
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
How Far Are We from Genuinely Useful Deep Research Agents?
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection
Mem-α: Learning Memory Construction via Reinforcement Learning
Search Self-play: Pushing the Frontier of Agent Capability without Supervision
CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
ScaleNet: Scaling up Pretrained Neural Networks with Incremental Parameters
Optimizing Mixture of Block Attention
FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks
Chain-of-Thought Hijacking
InstanceAssemble: Layout-Aware Image Generation via Instance Assembling Attention
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach
OneThinker: All-in-one Reasoning Model for Image and Video
ViDiC: Video Difference Captioning
PretrainZero: Reinforcement Active Pretraining
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
SimScale: Learning to Drive via Real-World Simulation at Scale
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
Guided Self-Evolving LLMs with Minimal Human Supervision
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
How Far Are We from Genuinely Useful Deep Research Agents?
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection
Mem-α: Learning Memory Construction via Reinforcement Learning
Search Self-play: Pushing the Frontier of Agent Capability without Supervision
CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
ScaleNet: Scaling up Pretrained Neural Networks with Incremental Parameters
Optimizing Mixture of Block Attention
FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks
Chain-of-Thought Hijacking
InstanceAssemble: Layout-Aware Image Generation via Instance Assembling Attention