Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once

EmbRACE-3K: Embodied Reasoning and Action in Complex Environments































REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once

EmbRACE-3K: Embodied Reasoning and Action in Complex Environments






























Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation
VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains
Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN
One Token to Fool LLM-as-a-Judge
From One to More: Contextual Part Latents for 3D Generation
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective
Neural-Driven Image Editing
KV Cache Steering for Inducing Reasoning in Small Language Models
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering
Test-Time Scaling with Reflective Generative Model
System-of-systems Modeling and Optimization: An Integrated Framework for Intermodal Mobility
All-atom Diffusion Transformers: Unified generative modelling of molecules and materials
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
MIRIX: Multi-Agent Memory System for LLM-Based Agents
Skywork-R1V3 Technical Report
T-LoRA: Single Image Diffusion Model Customization Without Overfitting
Scaling RL to Long Videos
Critiques of World Models
Is Diversity All You Need for Scalable Robotic Manipulation?
Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts
GTA1: GUI Test-time Scaling Agent
MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
The User-Centric Geo-Experience: An LLM-Powered Framework for Enhanced Planning, Navigation, and Dynamic Adaptation
PLAME: Leveraging Pretrained Language Models to Generate Enhanced Protein Multiple Sequence Alignments
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation
VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains
Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN
One Token to Fool LLM-as-a-Judge
From One to More: Contextual Part Latents for 3D Generation
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective
Neural-Driven Image Editing
KV Cache Steering for Inducing Reasoning in Small Language Models
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering
Test-Time Scaling with Reflective Generative Model
System-of-systems Modeling and Optimization: An Integrated Framework for Intermodal Mobility
All-atom Diffusion Transformers: Unified generative modelling of molecules and materials
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
MIRIX: Multi-Agent Memory System for LLM-Based Agents
Skywork-R1V3 Technical Report
T-LoRA: Single Image Diffusion Model Customization Without Overfitting
Scaling RL to Long Videos
Critiques of World Models
Is Diversity All You Need for Scalable Robotic Manipulation?
Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts
GTA1: GUI Test-time Scaling Agent
MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
The User-Centric Geo-Experience: An LLM-Powered Framework for Enhanced Planning, Navigation, and Dynamic Adaptation
PLAME: Leveraging Pretrained Language Models to Generate Enhanced Protein Multiple Sequence Alignments
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization