HyperAI
HyperAI
Startseite
Plattform
Dokumentation
Neuigkeiten
Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Nutzungsbedingungen
Datenschutzrichtlinie
Deutsch
HyperAI
HyperAI
Toggle Sidebar
Seite durchsuchen…
⌘
K
Command Palette
Search for a command to run...
Plattform
Startseite
SOTA
Audio-Klassifikation
Audio Classification On Vggsound
Audio Classification On Vggsound
Metriken
Top 1 Accuracy
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
Top 1 Accuracy
Paper Title
Mirasol3B
69.8
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
ONE-PEACE (Audio-Visual)
68.2
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
MAViL
67.1
-
EquiAV
67.1
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
MMT (Audio-Visual)
66.2
Multiscale Multimodal Transformer for Multimodal Action Recognition
CAV-MAE (Audio-Visual)
65.9
Contrastive Audio-Visual Masked Autoencoder
UAVM (Audio + Video)
65.8
UAVM: Towards Unifying Audio and Visual Models
Audiovisual Masked Autoencoder (Audiovisual, Single)
65.0
Audiovisual Masked Autoencoders
AVT (Audio-Visual)
63.9
AVT: Audio-Video Transformer for Multimodal Action Recognition
ONE-PEACE (Audio-Only)
59.6
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
CAV-MAE (Audio-Only)
59.5
Contrastive Audio-Visual Masked Autoencoder
Audiovisual Masked Autoencoder (Audio-only, Single)
57.2
Audiovisual Masked Autoencoders
MAST (Audio Only)
57.0
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
UAVM (Audio Only)
56.5
UAVM: Towards Unifying Audio and Visual Models
MMT (Video)
56.1
Multiscale Multimodal Transformer for Multimodal Action Recognition
PlayItBackX3
53.7
Play It Back: Iterative Attention for Audio Recognition
AVT (V)
53.2
AVT: Audio-Video Transformer for Multimodal Action Recognition
MBT (A)
52.3
Attention Bottlenecks for Multimodal Fusion
MBT (V)
51.2
Attention Bottlenecks for Multimodal Fusion
UAVM (Video Only)
49.9
UAVM: Towards Unifying Audio and Visual Models
0 of 21 row(s) selected.
Previous
Next
Audio Classification On Vggsound | SOTA | HyperAI