Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

SAM 3: Segment Anything with Concepts

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization































SAM 3: Segment Anything with Concepts

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization






























OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
SERES: Semantic-Aware Neural Reconstruction from Sparse Views
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models
Nested Learning: The Illusion of Deep Learning Architectures
SAM 3D: 3Dfy Anything in Images
Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
First Frame Is the Place to Go for Video Content Customization
Scaling Spatial Intelligence with Multimodal Foundation Models
Step-Audio-R1 Technical Report
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
Olmo 3
Early science acceleration experiments with GPT-5
Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset
VisPlay: Self-Evolving Vision-Language Models from Images
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
VIDEOP2R: Video Understanding from Perception to Reasoning
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
JAM-2: Fully computational design of drug-like antibodies with high success rates
PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs
Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
SERES: Semantic-Aware Neural Reconstruction from Sparse Views
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models
Nested Learning: The Illusion of Deep Learning Architectures
SAM 3D: 3Dfy Anything in Images
Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
First Frame Is the Place to Go for Video Content Customization
Scaling Spatial Intelligence with Multimodal Foundation Models
Step-Audio-R1 Technical Report
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
Olmo 3
Early science acceleration experiments with GPT-5
Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset
VisPlay: Self-Evolving Vision-Language Models from Images
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
VIDEOP2R: Video Understanding from Perception to Reasoning
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
JAM-2: Fully computational design of drug-like antibodies with high success rates
PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs
Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark