HyperAI
HyperAI
Startseite
Plattform
Dokumentation
Neuigkeiten
Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Nutzungsbedingungen
Datenschutzrichtlinie
Deutsch
HyperAI
HyperAI
Toggle Sidebar
Seite durchsuchen…
⌘
K
Command Palette
Search for a command to run...
Plattform
Startseite
SOTA
Audio-Klassifikation
Audio Classification On Audioset
Audio Classification On Audioset
Metriken
Test mAP
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
Test mAP
Paper Title
OmniVec2
0.558
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
OmniVec
0.548
OmniVec: Learning robust representations with cross modal sharing
EquiAV
0.546
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
MAViL (Audio-Visual, single)
0.533
-
Audiovisual Masked Autoencoder (Audiovisual, Single)
0.518
Audiovisual Masked Autoencoders
CAV-MAE (Audio-Visual)
0.512
Contrastive Audio-Visual Masked Autoencoder
BEATs (Audio-only, Ensemble)
0.506
BEATs: Audio Pre-Training with Acoustic Tokenizers
UAVM (Audio + Video)
0.504
UAVM: Towards Unifying Audio and Visual Models
SSLAM (Audio-Only, Single)
0.502
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
mn40_as (Ensemble)
0.498
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
ATST-C2F(Single)
0.497
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
MBT (AS-500K training + Video)
0.496
Attention Bottlenecks for Multimodal Fusion
PaSST (Ensemble)
0.496
Efficient Training of Audio Transformers with Patchout
DyMN-L (Audio-Only, Single)
0.490
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models
HTS-AT (Ensemble)
0.487
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
EAT
0.486
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
BEATs (Audio-only, Single)
0.486
BEATs: Audio Pre-Training with Acoustic Tokenizers
DTF-AT (Single)
0.486
DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification
M2D-AS/0.7
0.485
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
AST (Ensemble)
0.485
AST: Audio Spectrogram Transformer
0 of 50 row(s) selected.
Previous
Next
Audio Classification On Audioset | SOTA | HyperAI