Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Attention Residuals

Grounding World Simulation Models in a Real-World Metropolis































Attention Residuals

Grounding World Simulation Models in a Real-World Metropolis






























OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
AI Can Learn Scientific Taste
MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning
Can Vision-Language Models Solve the Shell Game?
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation
daVinci-Env: Open SWE Environment Synthesis at Scale
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
LMEB: Long-horizon Memory Embedding Benchmark
DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning
ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation
Video-Based Reward Modeling for Computer-Use Agents
IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training
Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams
ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning
In-Context Reinforcement Learning for Tool Use in Large Language Models
MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents
Flash-KMeans: Fast and Memory-Efficient Exact K-Means
OpenClaw-RL: Train Any Agent Simply by Talking
Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing
CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing
Believe Your Model: Distribution-Guided Confidence Calibration
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory
How Far Can Unsupervised RLVR Scale LLM Training?
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
AI Can Learn Scientific Taste
MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning
Can Vision-Language Models Solve the Shell Game?
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation
daVinci-Env: Open SWE Environment Synthesis at Scale
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
LMEB: Long-horizon Memory Embedding Benchmark
DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning
ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation
Video-Based Reward Modeling for Computer-Use Agents
IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training
Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams
ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning
In-Context Reinforcement Learning for Tool Use in Large Language Models
MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents
Flash-KMeans: Fast and Memory-Efficient Exact K-Means
OpenClaw-RL: Train Any Agent Simply by Talking
Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing
CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing
Believe Your Model: Distribution-Guided Confidence Calibration
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory
How Far Can Unsupervised RLVR Scale LLM Training?