Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments































Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments






























ClawGym: A Scalable Framework for Building Effective Claw Agents
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
Large Language Models Explore by Latent Distilling
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
SWE-chat: Coding Agent Interactions From Real Users in the Wild
AdaExplore: Failure-Driven Adaptation and Diversity-Preserving Search for Efficient Kernel Generation
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery
Meta-CoT: Enhancing Granularity and Generalization in Image Editing
DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
Recursive Multi-Agent Systems
Skill Retrieval Augmentation for Agentic AI
SketchVLM: Vision language models can annotate images to explain thoughts and guide users
RSRCC: A Remote Sensing Regional Change Comprehension Benchmark Constructed via Retrieval-Augmented Best-of-N Ranking
LongSpeech: A Scalable Benchmark for Transcription, Translation and Understanding in Long Speech
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
Video Analysis and Generation via a Semantic Progress Function
SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing
Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets
AgentSearchBench: A Benchmark for AI Agent Search in the Wild
FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing
LLM Safety From Within: Detecting Harmful Content with Internal Representations
DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
ClawGym: A Scalable Framework for Building Effective Claw Agents
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
Large Language Models Explore by Latent Distilling
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
SWE-chat: Coding Agent Interactions From Real Users in the Wild
AdaExplore: Failure-Driven Adaptation and Diversity-Preserving Search for Efficient Kernel Generation
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery
Meta-CoT: Enhancing Granularity and Generalization in Image Editing
DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
Recursive Multi-Agent Systems
Skill Retrieval Augmentation for Agentic AI
SketchVLM: Vision language models can annotate images to explain thoughts and guide users
RSRCC: A Remote Sensing Regional Change Comprehension Benchmark Constructed via Retrieval-Augmented Best-of-N Ranking
LongSpeech: A Scalable Benchmark for Transcription, Translation and Understanding in Long Speech
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
Video Analysis and Generation via a Semantic Progress Function
SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing
Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets
AgentSearchBench: A Benchmark for AI Agent Search in the Wild
FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing
LLM Safety From Within: Detecting Harmful Content with Internal Representations
DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond