HyperAI

3D Object Detection On Nuscenes Camera Only

Métriques

Future Frame
NDS

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle
Future Frame
NDS
Paper TitleRepository
Far3Dfalse68.7Far3D: Expanding the Horizon for Surround-view 3D Object Detection
BEVDet4Dfalse56.9BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection
CAPEfalse62.8CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
BEVDepth-purefalse60.9BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection
PETRv2-purefalse59.2PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
SA-BEVfalse62.4SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection
SOLOFusion-purefalse61.9Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection
StreamPETR-Largefalse67.6Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
HoPyes68.5Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
BEVStereofalse61.0BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo
RayDNfalse68.6Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection
BEVDistillfalse59.4BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection
PolarFormerfalse57.2PolarFormer: Multi-camera 3D Object Detection with Polar Transformer
SparseBEV (V2-99)yes67.5SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos
GeoBEV (V2-99)false66.2GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection
BEVFormer v2 (InternImage-XL)yes63.4BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
VCD-Afalse67.2Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection
SeaBirdfalse59.7SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects
BEVFormerfalse56.9BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
0 of 19 row(s) selected.