Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

BeSafe-Bench: Unveiling Behavioral Safety Risks of Situated Agents in Functional Environments

World Reasoning Arena

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Voxtral TTS

RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models

Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

PixelSmile: Toward Fine-Grained Facial Expression Editing

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code Harness

GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

EVA: Efficient Reinforcement Learning for End-to-End Video Agent

Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models

PEARL: Personalized Streaming Video Understanding Model

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost

F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting

SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

BeSafe-Bench: Unveiling Behavioral Safety Risks of Situated Agents in Functional Environments

World Reasoning Arena

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Voxtral TTS

RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models

Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

PixelSmile: Toward Fine-Grained Facial Expression Editing

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code Harness

GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

EVA: Efficient Reinforcement Learning for End-to-End Video Agent

Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models

PEARL: Personalized Streaming Video Understanding Model

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost

F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting

SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model