HyperAIHyperAI

Command Palette

Search for a command to run...

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Ali Hatamizadeh Jan Kautz

Abstract

We propose a novel hybrid Mamba-Transformer backbone, denoted as MambaVision,which is specifically tailored for vision applications. Our core contributionincludes redesigning the Mamba formulation to enhance its capability forefficient modeling of visual features. In addition, we conduct a comprehensiveablation study on the feasibility of integrating Vision Transformers (ViT) withMamba. Our results demonstrate that equipping the Mamba architecture withseveral self-attention blocks at the final layers greatly improves the modelingcapacity to capture long-range spatial dependencies. Based on our findings, weintroduce a family of MambaVision models with a hierarchical architecture tomeet various design criteria. For Image classification on ImageNet-1K dataset,MambaVision model variants achieve a new State-of-the-Art (SOTA) performance interms of Top-1 accuracy and image throughput. In downstream tasks such asobject detection, instance segmentation and semantic segmentation on MS COCOand ADE20K datasets, MambaVision outperforms comparably-sized backbones anddemonstrates more favorable performance. Code:https://github.com/NVlabs/MambaVision.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp