Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning

OpenAutoNLU: Open Source AutoML Library for NLU































MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning

OpenAutoNLU: Open Source AutoML Library for NLU






























OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens
From Scale to Speed: Adaptive Test-Time Scaling for Image Editing
Multi-agent cooperation through in-context co-player inference
ACTIONENGINE: From Reactive to Programmatic GUI Agents via State Machine Memory
CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era
Mode Seeking meets Mean Seeking for Fast Long Video Generation
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets
Enhancing Spatial Understanding in Image Generation via Reward Modeling
dLLM: Simple Diffusion Language Modeling
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
Imagination Helps Visual Reasoning, But Not Yet in Latent Space
OmniGAIA: Towards Native Omni-Modal AI Agents
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
The Trinity of Consistency as a Defining Principle for General World Models
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation
MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models
HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation
DREAM: Deep Research Evaluation with Agentic Metrics
LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces
PyVision-RL: Forging Open Agentic Vision Models via RL
From Perception to Action: An Interactive Benchmark for Vision Reasoning
Query-focused and Memory-aware Reranker for Long Context Processing
On Data Engineering for Scaling LLM Terminal Capabilities
DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens
From Scale to Speed: Adaptive Test-Time Scaling for Image Editing
Multi-agent cooperation through in-context co-player inference
ACTIONENGINE: From Reactive to Programmatic GUI Agents via State Machine Memory
CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era
Mode Seeking meets Mean Seeking for Fast Long Video Generation
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets
Enhancing Spatial Understanding in Image Generation via Reward Modeling
dLLM: Simple Diffusion Language Modeling
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
Imagination Helps Visual Reasoning, But Not Yet in Latent Space
OmniGAIA: Towards Native Omni-Modal AI Agents
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
The Trinity of Consistency as a Defining Principle for General World Models
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation
MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models
HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation
DREAM: Deep Research Evaluation with Agentic Metrics
LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces
PyVision-RL: Forging Open Agentic Vision Models via RL
From Perception to Action: An Interactive Benchmark for Vision Reasoning
Query-focused and Memory-aware Reranker for Long Context Processing
On Data Engineering for Scaling LLM Terminal Capabilities
DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device