Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents































Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents






























Yume-1.5: A Text-Controlled Interactive World Generation Model
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration
Attention Is Not What You Need
SlideTailor: Personalized Presentation Slide Generation for Scientific Papers
InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion
Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding
Measuring short-form factuality in large language models
DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets
Latent Implicit Visual Reasoning
LLM Personas as a Substitute for Field Experiments in Method Benchmarking
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models
DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation
T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation
TongSIM: A General Platform for Simulating Intelligent Machines
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic
A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care
Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation
Active Intelligence in Video Avatars via Closed-loop World Modeling
FaithLens: Detecting and Explaining Faithfulness Hallucination
SAM Audio: Segment Anything in Audio
Yume-1.5: A Text-Controlled Interactive World Generation Model
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration
Attention Is Not What You Need
SlideTailor: Personalized Presentation Slide Generation for Scientific Papers
InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion
Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding
Measuring short-form factuality in large language models
DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets
Latent Implicit Visual Reasoning
LLM Personas as a Substitute for Field Experiments in Method Benchmarking
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models
DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation
T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation
TongSIM: A General Platform for Simulating Intelligent Machines
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic
A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care
Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation
Active Intelligence in Video Avatars via Closed-loop World Modeling
FaithLens: Detecting and Explaining Faithfulness Hallucination
SAM Audio: Segment Anything in Audio