HyperAIHyperAI

Command Palette

Search for a command to run...

Cross-view Transformers for real-time Map-view Semantic Segmentation

Brady Zhou Philipp Krähenbühl

Abstract

We present cross-view transformers, an efficient attention-based model formap-view semantic segmentation from multiple cameras. Our architectureimplicitly learns a mapping from individual camera views into a canonicalmap-view representation using a camera-aware cross-view attention mechanism.Each camera uses positional embeddings that depend on its intrinsic andextrinsic calibration. These embeddings allow a transformer to learn themapping across different views without ever explicitly modeling itgeometrically. The architecture consists of a convolutional image encoder foreach view and cross-view transformer layers to infer a map-view semanticsegmentation. Our model is simple, easily parallelizable, and runs inreal-time. The presented architecture performs at state-of-the-art on thenuScenes dataset, with 4x faster inference speeds. Code is available athttps://github.com/bradyz/cross_view_transformers.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Cross-view Transformers for real-time Map-view Semantic Segmentation | Papers | HyperAI