Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Video-As-Prompt: Unified Semantic Control for Video Generation





























Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Video-As-Prompt: Unified Semantic Control for Video Generation




























DeepAgent: A General Reasoning Agent with Scalable Toolsets
Uncertainty-Aware Multi-Objective Reinforcement Learning-Guided Diffusion Models for 3D De Novo Molecular Design
Reac-Discovery: an artificial intelligence–driven platform for continuous-flow catalytic reactor discovery and optimization
BoltzGen:Toward Universal Binder Design
HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents in Hierarchical Rule Application
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders
Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1
See the Text: From Tokenization to Visual Reading
Directional Reasoning Injection for Fine-Tuning MLLMs
Language Models are Injective and Hence Invertible
The Free Transformer
Quantum Processing Unit (QPU) processing time Prediction with Machine Learning
Observation of constructive interference at the edge of quantum ergodicity
VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning
Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion Generation
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios
FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools
A Survey on Cache Methods in Diffusion Models: Toward Efficient Multi-Modal Generation
Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks
Spatially-Varying Autofocus
DeepAgent: A General Reasoning Agent with Scalable Toolsets
Uncertainty-Aware Multi-Objective Reinforcement Learning-Guided Diffusion Models for 3D De Novo Molecular Design
Reac-Discovery: an artificial intelligence–driven platform for continuous-flow catalytic reactor discovery and optimization
BoltzGen:Toward Universal Binder Design
HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents in Hierarchical Rule Application
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders
Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1
See the Text: From Tokenization to Visual Reading
Directional Reasoning Injection for Fine-Tuning MLLMs
Language Models are Injective and Hence Invertible
The Free Transformer
Quantum Processing Unit (QPU) processing time Prediction with Machine Learning
Observation of constructive interference at the edge of quantum ergodicity
VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning
Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion Generation
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios
FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools
A Survey on Cache Methods in Diffusion Models: Toward Efficient Multi-Modal Generation
Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks
Spatially-Varying Autofocus