Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Kimi Linear: An Expressive, Efficient Attention Architecture

Emu3.5: Native Multimodal Models are World Learners































Kimi Linear: An Expressive, Efficient Attention Architecture

Emu3.5: Native Multimodal Models are World Learners






























The End of Manual Decoding: Towards Truly End-to-End Language Models
Human-AI Complementarity: A Goal for Amplified Oversight
GPTOpt: Towards Efficient LLM-Based Black-Box Optimization
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning
Reasoning-Aware GRPO using Process Mining
Scaling Latent Reasoning via Looped Language Models
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools
OmniCast: A Masked Latent Diffusion Model for Weather Forecasting Across Time Scales
Uniform Discrete Diffusion with Metric Path for Video Generation
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
AgentFold: Long-Horizon Web Agents with Proactive Context Management
Tongyi DeepResearch Technical Report
InteractComp: Evaluating Search Agents With Ambiguous Queries
VLM-SlideEval: Evaluating VLMs on Structured Comprehension and Perturbation Sensitivity in PPT
TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting
FARMER: Flow AutoRegressive Transformer over Pixels
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?
ReCode: Unify Plan and Action for Universal Granularity Control
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
Magellan: Guided MCTS for Latent Space Exploration and Novelty Generation
DEEDEE: Fast and Scalable Out-of-Distribution Dynamics Detection
Sparser Block-Sparse Attention via Token Permutation
A Definition of AGI
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model
The End of Manual Decoding: Towards Truly End-to-End Language Models
Human-AI Complementarity: A Goal for Amplified Oversight
GPTOpt: Towards Efficient LLM-Based Black-Box Optimization
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning
Reasoning-Aware GRPO using Process Mining
Scaling Latent Reasoning via Looped Language Models
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools
OmniCast: A Masked Latent Diffusion Model for Weather Forecasting Across Time Scales
Uniform Discrete Diffusion with Metric Path for Video Generation
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
AgentFold: Long-Horizon Web Agents with Proactive Context Management
Tongyi DeepResearch Technical Report
InteractComp: Evaluating Search Agents With Ambiguous Queries
VLM-SlideEval: Evaluating VLMs on Structured Comprehension and Perturbation Sensitivity in PPT
TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting
FARMER: Flow AutoRegressive Transformer over Pixels
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?
ReCode: Unify Plan and Action for Universal Granularity Control
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
Magellan: Guided MCTS for Latent Space Exploration and Novelty Generation
DEEDEE: Fast and Scalable Out-of-Distribution Dynamics Detection
Sparser Block-Sparse Attention via Token Permutation
A Definition of AGI
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model