HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Audio Classification
Audio Classification On Audioset
Audio Classification On Audioset
Metrics
Test mAP
Results
Performance results of various models on this benchmark
Columns
Model Name
Test mAP
Paper Title
OmniVec2
0.558
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
OmniVec
0.548
OmniVec: Learning robust representations with cross modal sharing
EquiAV
0.546
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
MAViL (Audio-Visual, single)
0.533
-
Audiovisual Masked Autoencoder (Audiovisual, Single)
0.518
Audiovisual Masked Autoencoders
CAV-MAE (Audio-Visual)
0.512
Contrastive Audio-Visual Masked Autoencoder
BEATs (Audio-only, Ensemble)
0.506
BEATs: Audio Pre-Training with Acoustic Tokenizers
UAVM (Audio + Video)
0.504
UAVM: Towards Unifying Audio and Visual Models
SSLAM (Audio-Only, Single)
0.502
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
mn40_as (Ensemble)
0.498
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
ATST-C2F(Single)
0.497
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
MBT (AS-500K training + Video)
0.496
Attention Bottlenecks for Multimodal Fusion
PaSST (Ensemble)
0.496
Efficient Training of Audio Transformers with Patchout
DyMN-L (Audio-Only, Single)
0.490
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models
HTS-AT (Ensemble)
0.487
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
EAT
0.486
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
BEATs (Audio-only, Single)
0.486
BEATs: Audio Pre-Training with Acoustic Tokenizers
DTF-AT (Single)
0.486
DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification
M2D-AS/0.7
0.485
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
AST (Ensemble)
0.485
AST: Audio Spectrogram Transformer
0 of 50 row(s) selected.
Previous
Next