HyperAIHyperAI

Audio Classification On Audioset

Metriken

Test mAP

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname
Test mAP
Paper TitleRepository
EAT0.486EAT: Self-Supervised Pre-Training with Efficient Audio Transformer-
mn40_as (Single)0.483Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation-
M2D-AS/0.70.485Masked Modeling Duo: Towards a Universal Audio Pre-training Framework-
MAViL (Audio-Visual, single)0.533--
EAT-S0.405End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network-
BEATs (Audio-only, Single)0.486BEATs: Audio Pre-Training with Acoustic Tokenizers-
CAV-MAE (Audio-Visual)0.512Contrastive Audio-Visual Masked Autoencoder-
L30.249Look, Listen and Learn-
OmniVec20.558OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning-
PSLA (Single)0.443PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation-
EAT-M0.426End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network-
ATST-Frame0.480Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks-
CAV-MAE (Audio-Only)0.466Contrastive Audio-Visual Masked Autoencoder-
DTF-AT (Single)0.486DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification
Audiovisual Masked Autoencoder (Audio-only, Single)0.466Audiovisual Masked Autoencoders-
AST (Ensemble)0.485AST: Audio Spectrogram Transformer-
BEATs (Audio-only, Ensemble)0.506BEATs: Audio Pre-Training with Acoustic Tokenizers-
EquiAV0.546EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning-
MMV0.309Self-Supervised MultiModal Versatile Networks-
CAV-MAE (Visual-Only)0.262Contrastive Audio-Visual Masked Autoencoder-
0 of 50 row(s) selected.
Audio Classification On Audioset | SOTA | HyperAI