Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Qwen-Image-Flash: Beyond Objective Design

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Audio Interaction Model

Cosmos 3: Omnimodal World Models for Physical AI

Learning, Fast and Slow: Towards LLMs That Adapt Continually

LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks

World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning

From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain

A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Trust Region On-Policy Distillation

OCC-RAG: Optimal Cognitive Core for Faithful Question Answering

MAI-Thinking-1: Building a Hill-Climbing Machine

VLM3: Vision Language Models Are Native 3D Learners

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

DeepCrack: A deep hierarchical feature learning architecture for crack segmentation

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

Draft-OPD: On-Policy Distillation for Speculative Draft Models

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

TACK: A statistical evaluation of degradation activity on a novel TArgeting Chimeras Knowledge dataset

Narrative Weaver: Towards Controllable Long-Range Visual Consistency with Multi-Modal Conditioning

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Trust-Region Behavior Blending for On-Policy Distillation

SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue

Representation Forcing for Bottleneck-Free Unified Multimodal Models

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Qwen-Image-Flash: Beyond Objective Design

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Audio Interaction Model

Cosmos 3: Omnimodal World Models for Physical AI

Learning, Fast and Slow: Towards LLMs That Adapt Continually

LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks

World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning

From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain

A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Trust Region On-Policy Distillation

OCC-RAG: Optimal Cognitive Core for Faithful Question Answering

MAI-Thinking-1: Building a Hill-Climbing Machine

VLM3: Vision Language Models Are Native 3D Learners

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

DeepCrack: A deep hierarchical feature learning architecture for crack segmentation

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

Draft-OPD: On-Policy Distillation for Speculative Draft Models

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

TACK: A statistical evaluation of degradation activity on a novel TArgeting Chimeras Knowledge dataset

Narrative Weaver: Towards Controllable Long-Range Visual Consistency with Multi-Modal Conditioning

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Trust-Region Behavior Blending for On-Policy Distillation

SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue

Representation Forcing for Bottleneck-Free Unified Multimodal Models