HyperAI超神経

Action Recognition In Videos On Something 1

評価指標

GFLOPs
Param.
Top 1 Accuracy
Top 5 Accuracy

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

比較表
モデル名GFLOPsParam.Top 1 AccuracyTop 5 Accuracy
diverse-temporal-aggregation-and-depthwise9.3x65.8M49.578.0
video-classification-with-channel-separated--53.3-
videomae-v2-scaling-video-masked-autoencoders--68.791.9
hierarchical-feature-aggregation-networks-for--41.97-
eco-efficient-convolutional-network-for--46.4-
diverse-temporal-aggregation-and-depthwise11.5x63.3M49.878.0
spatiotemporal-self-attention-modeling-with--58.3-
diverse-temporal-aggregation-and-depthwise20.9x65.8M54.5982.30
temporally-adaptive-models-for-efficient--60.7-
temporally-adaptive-models-for-efficient--63.7-
multi-scale-motion-aware-module-for-video--57.9-
temporal-shift-module-for-efficient-video--50.7-
slow-fast-visual-tempo-learning-for-video--57.2-
mars-motion-augmented-rgb-stream-for-action--40.4-
ean-event-adaptive-network-for-enhanced--57.283.9
moments-in-time-dataset-one-million-videos--48.6-
temporal-relational-reasoning-in-videos--42.01-
video-classification-with-channel-separated--48.4-
learning-self-similarity-in-space-and-time-as-1--54.382.9
ct-net-channel-tensorization-network-for-1--56.6-
uniformerv2-spatiotemporal-learning-by-arming--62.788.0
non-local-neural-networks--44.4-
internvideo-general-video-foundation-models--70.0-
190807625--53.4-
relational-self-attention-what-s-missing-in--56.182.8
temporal-shift-module-for-efficient-video--49.778.5
video-classification-with-finecoarse-networks--57.184.2
tds-clip-temporal-difference-side-network-for--63.087.8
motionsqueeze-neural-motion-feature-learning--52.182.3
diverse-temporal-aggregation-and-depthwise5.7x63.3M48.176.9
learning-correlation-structures-for-vision--61.3-
video-classification-with-channel-separated--51.6-
motion-feature-network-fixed-motion-filter--43.9-
moments-in-time-dataset-one-million-videos--50-
uniformer-unified-transformer-for-efficient41.8x321.457.684.9
temporal-reasoning-graph-for-activity--49.586.1
pan-towards-fast-action-recognition-via--55.382.8
relational-self-attention-what-s-missing-in--51.979.6
relational-self-attention-what-s-missing-in--54.081.1
temporal-shift-module-for-efficient-video--47.277.1
spatial-temporal-pyramid-graph-reasoning-for--53.5-
rethinking-spatiotemporal-feature-learning--48.278.7
region-based-non-local-operation-for-video--52.781.5
side4video-spatial-temporal-side-network-for--67.388.8
mlp-3d-a-mlp-like-3d-architecture-with-1--56.5-
uniformer-unified-transformer-for-efficient259x350.160.987.3
video-classification-with-channel-separated--52.1-
temporal-reasoning-graph-for-activity--49.7-
ae-net-adjoint-enhancement-network-for--55.0-
knowing-what-where-and-when-to-look-efficient--52.681.3
gate-shift-networks-for-video-action--55.16-
stand-alone-inter-frame-attention-in-video-1--57.3-
video-classification-with-channel-separated--49.3-
action-recognition-with-motion--56.6-
what-can-simple-arithmetic-operations-do-for--65.688.6
videos-as-space-time-region-graphs--46.1-
learning-self-similarity-in-space-and-time-as-1--56.684.4
recurrent-space-time-graphs-for-video--49.2-
temporal-relational-reasoning-in-videos--34.4-
motionsqueeze-neural-motion-feature-learning--55.1-
motionsqueeze-neural-motion-feature-learning--50.980.3
relational-self-attention-what-s-missing-in--55.582.6
learning-self-similarity-in-space-and-time-as-1--55.883.9
action-keypoint-network-for-efficient-video--52.5-
mvfnet-multi-view-fusion-network-for--54.0-
diverse-temporal-aggregation-and-depthwise11.5x63.3M52.6880.43
diverse-temporal-aggregation-and-depthwise20.9x65.8M50.678.7
motionsqueeze-neural-motion-feature-learning--54.483.8
region-based-non-local-operation-for-video--54.182.2
gate-shift-networks-for-video-action--51.68-
tdn-temporal-difference-networks-for--56.884.1
mars-motion-augmented-rgb-stream-for-action--53.0-
eco-efficient-convolutional-network-for--46.4-
rethinking-spatiotemporal-feature-learning--47.378.1