HyperAI

Action Recognition On Epic Kitchens 100

Metriken

Action@1
GFLOPs
Noun@1
Verb@1

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Vergleichstabelle
ModellnameAction@1GFLOPsNoun@1Verb@1
movinets-mobile-video-networks-for-efficient44.574.9x155.169.1
training-a-large-video-model-on-a-single54.4-65.473.0
memvit-memory-augmented-multiscale-vision48.4-60.371.4
rescaling-egocentric-vision36.81---
movinets-mobile-video-networks-for-efficient41.27.59x152.367.1
rescaling-egocentric-vision33.57---
gate-shift-fuse-for-video-action-recognition44.48-53.1869.06
2103-1569144.0-56.866.4
temporally-adaptive-models-for-efficient48.9-60.271.0
object-region-video-transformers-145.7-58.768.4
cast-cross-attention-in-space-and-time-for-149.3-60.972.5
keeping-your-eye-on-the-ball-trajectory44.5-58.567.0
technical-report-temporal-aggregate45.26-53.3566
learning-video-representations-from-large51-62.972
multiscale-multimodal-transformer-for47.8-61.070.1
attention-bottlenecks-for-multimodal-fusion43.4-5864.8
keeping-your-eye-on-the-ball-trajectory44.1-57.667.1
keeping-your-eye-on-the-ball-trajectory43.1-56.566.7
m-m-mix-a-multimodal-multiview-transformer53.6-66.372.0
omnivore-a-single-model-for-many-visual49.9-61.769.5
multiview-transformers-for-video-recognition50.5-63.969.9
rescaling-egocentric-vision35.55---
rescaling-egocentric-vision35.28---
temporally-adaptive-models-for-efficient51.8-64.171.7
rescaling-egocentric-vision37.39---
movinets-mobile-video-networks-for-efficient47.7117x157.372.2
avt-audio-video-transformer-for-multimodal47.2-59.370.4
movinets-mobile-video-networks-for-efficient44.442.2x156.268.8
movinets-mobile-video-networks-for-efficient36.81.74x147.464.8
extending-video-masked-autoencoders-to-128-152.1-61.875.0