Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

3EED: Ground Everything Everywhere in 3D

DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding































3EED: Ground Everything Everywhere in 3D

DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding






























CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings
Geometrically-Constrained Agent for Spatial Reasoning
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
DiP: Taming Diffusion Models in Pixel Space
Architecture Decoupling Is Not All You Need For Unified Multimodal Model
Vision Bridge Transformer at Scale
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement
REASONEDIT: Towards Reasoning-Enhanced Image Editing Models
OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability
Qwen3-VL Technical Report
G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
Video Generation Models Are Good Latent Reward Models
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Think Visually, Reason Textually: Vision-Language Synergy in ARC
Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
Latent Collaboration in Multi-Agent Systems
Multimodal Evaluation of Russian-language Architectures
ROOT: Robust Orthogonalized Optimizer for Neural Network Training
Superposition Yields Robust Neural Scaling
Optimal Mistake Bounds for Transductive Online Learning
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Why Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings
Geometrically-Constrained Agent for Spatial Reasoning
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
DiP: Taming Diffusion Models in Pixel Space
Architecture Decoupling Is Not All You Need For Unified Multimodal Model
Vision Bridge Transformer at Scale
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement
REASONEDIT: Towards Reasoning-Enhanced Image Editing Models
OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability
Qwen3-VL Technical Report
G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
Video Generation Models Are Good Latent Reward Models
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Think Visually, Reason Textually: Vision-Language Synergy in ARC
Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
Latent Collaboration in Multi-Agent Systems
Multimodal Evaluation of Russian-language Architectures
ROOT: Robust Orthogonalized Optimizer for Neural Network Training
Superposition Yields Robust Neural Scaling
Optimal Mistake Bounds for Transductive Online Learning
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Why Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free