Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models

Kosmos: An AI Scientist for Autonomous Discovery































Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models

Kosmos: An AI Scientist for Autonomous Discovery






























Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs
Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
The AI Productivity Index (APEX)
Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning
Towards Robust Mathematical Reasoning
Towards a future space-based, highly scalable AI infrastructure system design
PHUMA: Physically-Grounded Humanoid Locomotion Dataset
UniREditBench: A Unified Reasoning-based Image Editing Benchmark
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback
The Underappreciated Power of Vision Models for Graph Structural Understanding
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation
NOBLE - Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models
Glia: A Human-Inspired AI for Automated Systems Design and Optimization
Context Engineering 2.0: The Context of Context Engineering
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning
Continuous Autoregressive Language Models
ฯ๐๐ป: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
The Era of Agentic Organization: Learning to Organize with Language Models
SPICE: Self-Play In Corpus Environments Improves Reasoning
Surfer 2: The Next Generation of Cross-Platform Computer Use Agents
Exploring Conditions for Diffusion models in Robotic Control
Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games
Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs
Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
The AI Productivity Index (APEX)
Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning
Towards Robust Mathematical Reasoning
Towards a future space-based, highly scalable AI infrastructure system design
PHUMA: Physically-Grounded Humanoid Locomotion Dataset
UniREditBench: A Unified Reasoning-based Image Editing Benchmark
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback
The Underappreciated Power of Vision Models for Graph Structural Understanding
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation
NOBLE - Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models
Glia: A Human-Inspired AI for Automated Systems Design and Optimization
Context Engineering 2.0: The Context of Context Engineering
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning
Continuous Autoregressive Language Models
ฯ๐๐ป: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
The Era of Agentic Organization: Learning to Organize with Language Models
SPICE: Self-Play In Corpus Environments Improves Reasoning
Surfer 2: The Next Generation of Cross-Platform Computer Use Agents
Exploring Conditions for Diffusion models in Robotic Control
Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games