HyperAI超神経

Action Recognition In Videos On Something

評価指標

Top-1 Accuracy
Top-5 Accuracy

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

比較表
モデル名Top-1 AccuracyTop-5 Accuracy
temporal-reasoning-graph-for-activity62.290.3
mvfnet-multi-view-fusion-network-for66.3-
keeping-your-eye-on-the-ball-trajectory68.191.2
visual-representation-learning-from-unlabeled73.7-
more-is-less-learning-efficient-video-165.2-
spatiotemporal-self-attention-modeling-with69.8-
videomae-masked-autoencoders-are-data-170.892.4
motionsqueeze-neural-motion-feature-learning66.690.6
tada-temporally-adaptive-convolutions-for-164.088.0
videomae-masked-autoencoders-are-data-175.495.2
pan-towards-fast-action-recognition-via66.590.6
movinets-mobile-video-networks-for-efficient61.388.2
keeping-your-eye-on-the-ball-trajectory67.190.6
object-region-video-transformers-167.990.5
self-supervised-video-transformer59.2-
comparative-analysis-of-cnn-based47.73-
is-space-time-attention-all-you-need-for62.3-
mutual-modality-learning-for-video-action66.8391.30
asymmetric-masked-distillation-for-pre70.292.5
uniformer-unified-transformer-for-efficient71.292.8
asymmetric-masked-distillation-for-pre73.394.0
ct-net-channel-tensorization-network-for-167.891.1
improved-multiscale-vision-transformers-for--
a-multigrid-method-for-efficiently-training61.7-
video-swin-transformer69.692.7
masked-video-distillation-rethinking-masked77.395.7
mar-masked-autoencoders-for-efficient-action69.591.9
maximizing-spatio-temporal-entropy-of-deep-3d65.789.8
masked-video-distillation-rethinking-masked70.992.8
vimpac-video-pre-training-via-masked-token68.1-
movinets-mobile-video-networks-for-efficient62.789.0
internvideo2-scaling-video-foundation-models112
mlp-3d-a-mlp-like-3d-architecture-with-168.5-
the-something-something-video-database-for51.3380.46
diverse-temporal-aggregation-and-depthwise64.188.6
movinets-mobile-video-networks-for-efficient--
cast-cross-attention-in-space-and-time-for-171.6-
learning-self-similarity-in-space-and-time-as-165.789.8
relational-self-attention-what-s-missing-in-91.1
space-time-mixing-attention-for-video67.290.8
direcformer-a-directed-attention-in64.9487.9
relational-self-attention-what-s-missing-in64.889.1
object-region-video-transformers-169.591.5
masked-feature-prediction-for-self-supervised75.095.0
slow-fast-visual-tempo-learning-for-video67.8-
slowfast-networks-for-video-recognition61.7-
masked-video-distillation-rethinking-masked76.795.5
2103-1569165.489.8
implicit-temporal-modeling-with-learnable70.291.8
tada-temporally-adaptive-convolutions-for-167.289.8
relational-self-attention-what-s-missing-in6689.8
diverse-temporal-aggregation-and-depthwise65.889.5
is-space-time-attention-all-you-need-for59.5-
learning-correlation-structures-for-vision71.5-
mutual-modality-learning-for-video-action69.0292.70
omnivore-a-single-model-for-many-visual71.493.5
vidtr-video-transformer-without-convolutions60.2-
spatial-temporal-pyramid-graph-reasoning-for67.0-
morphmlp-a-self-attention-free-mlp-like70.192.8
diverse-temporal-aggregation-and-depthwise63.288.2
mar-masked-autoencoders-for-efficient-action73.894.4
learning-self-similarity-in-space-and-time-as-167.791.1
stand-alone-inter-frame-attention-in-video-169.8-
group-contextualization-for-video-recognition67.891.2
global-temporal-difference-network-for-action67.6-
videomae-v2-scaling-video-masked-autoencoders77.095.9
relational-self-attention-what-s-missing-in67.791.1
hiera-a-hierarchical-vision-transformer76.5-
tada-temporally-adaptive-convolutions-for-167.190.4
omnivl-one-foundation-model-for-image62.586.2
tdn-temporal-difference-networks-for69.692.2
co-training-transformer-with-videos-and70.992.5
multiscale-vision-transformers67.891.3
knowing-what-where-and-when-to-look-efficient66.590.4
improved-multiscale-vision-transformers-for73.394.1
zeroi2v-zero-cost-adaptation-of-pre-trained72.293.0
action-recognition-with-motion67.1-
paying-more-attention-to-motion-attention49.979.1
improved-multiscale-vision-transformers-for-93.4
diverse-temporal-aggregation-and-depthwise65.2489.48
internvideo2-scaling-video-foundation-models77.1-
temporal-reasoning-graph-for-activity61.391.4
internvideo-general-video-foundation-models77.2-
temporal-pyramid-network-for-action62.0-
motionsqueeze-neural-motion-feature-learning64.789.4
multiscale-vision-transformers66.290.2
uniformerv2-spatiotemporal-learning-by-arming73.094.5
learning-self-similarity-in-space-and-time-as-167.491
uniformer-unified-transformer-for-efficient69.492.1
diverse-temporal-aggregation-and-depthwise67.3590.50
multi-scale-motion-aware-module-for-video68.2-
few-shot-video-classification-via-temporal52.3-
what-can-simple-arithmetic-operations-do-for74.694.4
relational-self-attention-what-s-missing-in67.390.8
multiview-transformers-for-video-recognition68.590.4
bevt-bert-pretraining-of-video-transformers71.4-
action-keypoint-network-for-efficient-video64.3-
tdn-temporal-difference-networks-for68.291.6
motionsqueeze-neural-motion-feature-learning6388.4
mar-masked-autoencoders-for-efficient-action74.794.9
multiscale-vision-transformers68.791.5
diverse-temporal-aggregation-and-depthwise64.288.8
side4video-spatial-temporal-side-network-for75.294.0
cooperative-cross-stream-network-for61.289.3
masked-video-distillation-rethinking-masked73.794.0
improved-multiscale-vision-transformers-for72.1-
mar-masked-autoencoders-for-efficient-action71.092.8
movinets-mobile-video-networks-for-efficient63.589.0
co-training-transformer-with-videos-and69.891.9
temporal-shift-module-for-efficient-video66.691.3
keeping-your-eye-on-the-ball-trajectory66.590.1
rethinking-video-vits-sparse-video-tubes-for76.195.2
tds-clip-temporal-difference-side-network-for73.493.8
is-space-time-attention-all-you-need-for62.5-
the-effectiveness-of-mae-pre-pretraining-for74.4-
tada-temporally-adaptive-convolutions-for-165.689.2
videomae-masked-autoencoders-are-data-174.394.6
implicit-temporal-modeling-with-learnable66.890.3
temporally-adaptive-models-for-efficient71.1-
temporally-adaptive-models-for-efficient73.6-
parameter-efficient-image-to-video-transfer72.393.9
prompt-learning-for-action-recognition67.391