Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

DeepSeek-OCR 2: Visual Causal Flow

Learning to Discover at Test Time































DeepSeek-OCR 2: Visual Causal Flow

Learning to Discover at Test Time






























Eliciting Harmful Capabilities by Fine-Tuning On Safeguarded Outputs
Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory
Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents
LongCat-Flash-Thinking-2601 Technical Report
Can Language Models Discover Scaling Laws?
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
LLM-in-Sandbox Elicits General Agentic Intelligence
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience
HY-MT1.5 Technical Report
Scaling Laws for Code: Every Programming Language Matters
Qwen3-TTS Technical Report
Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition
FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents
DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution
Rethinking Video Generation Model for the Embodied World
Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance
Agentic Reasoning for Large Language Models
PERSONAPLEX: VOICE AND ROLE CONTROL FOR FULL DUPLEX CONVERSATIONALSPEECH MODELS
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning
MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer
Eliciting Harmful Capabilities by Fine-Tuning On Safeguarded Outputs
Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory
Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents
LongCat-Flash-Thinking-2601 Technical Report
Can Language Models Discover Scaling Laws?
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
LLM-in-Sandbox Elicits General Agentic Intelligence
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience
HY-MT1.5 Technical Report
Scaling Laws for Code: Every Programming Language Matters
Qwen3-TTS Technical Report
Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition
FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents
DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution
Rethinking Video Generation Model for the Embodied World
Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance
Agentic Reasoning for Large Language Models
PERSONAPLEX: VOICE AND ROLE CONTROL FOR FULL DUPLEX CONVERSATIONALSPEECH MODELS
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning
MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer