Action Recognition In Videos On Ucf101
Metriken
3-fold Accuracy
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Vergleichstabelle
Modellname | 3-fold Accuracy |
---|---|
large-scale-video-classification-with-1 | 65.4 |
hallucinet-ing-spatiotemporal-representations | 79.83 |
dmc-net-generating-discriminative-motion-cues | 96.5 |
learning-spatio-temporal-representation-with-3 | 97 |
i3d-lstm-a-new-model-for-human-action | 95.1 |
vimpac-video-pre-training-via-masked-token | 92.7 |
quo-vadis-action-recognition-a-new-model-and | 97.8 |
real-time-action-recognition-with-enhanced | 86.4 |
spatiotemporal-residual-networks-for-video | 94.6 |
a-closer-look-at-spatiotemporal-convolutions | 93.3 |
self-supervised-video-transformer | 93.7 |
adaptive-frame-selection-in-two-dimensional | - |
mlgcn-multi-laplacian-graph-convolutional | 63.27 |
two-stream-video-classification-with-cross | 96.5 |
a-closer-look-at-spatiotemporal-convolutions | 93.6 |
quo-vadis-action-recognition-a-new-model-and | 96.7 |
hidden-two-stream-convolutional-networks-for | 97.1 |
omnivec2-a-novel-transformer-based-network | 99.6 |
multi-fiber-networks-for-video-recognition | 96.0 |
omnivec-learning-robust-representations-with | 99.6 |
a-closer-look-at-spatiotemporal-convolutions | 95.5 |
temporal-spatial-mapping-for-action | 94.3 |
ligar-lightweight-general-purpose-action | 94.85 |
vidtr-video-transformer-without-convolutions | 96.7 |
learning-spatio-temporal-representation-with-3 | 98.2 |
faster-recurrent-networks-for-video | 96.9 |
dmc-net-generating-discriminative-motion-cues | 92.3 |
quo-vadis-action-recognition-a-new-model-and | 95.1 |
quo-vadis-action-recognition-a-new-model-and | 93.4 |
dynamic-image-networks-for-action-recognition | 89.1 |
r-stan-residual-spatial-temporal-attention | 94.5 |
potion-pose-motion-representation-for-action | 29.3 |
asymmetric-masked-distillation-for-pre | 97.1 |
convolutional-two-stream-network-fusion-for | 92.5 |
quo-vadis-action-recognition-a-new-model-and | 98.0 |
learning-spatio-temporal-representation-with | 88.6 |
learning-spatiotemporal-features-with-3d | 82.3 |
multi-region-two-stream-r-cnn-for-action | 91.1 |
paying-more-attention-to-motion-attention | 95.7 |
action-recognition-with-trajectory-pooled | 91.5 |
learning-spatio-temporal-representation-with-3 | 96.8 |
contextual-action-cues-from-camera-sensor-for | 97.2 |
long-term-temporal-convolutions-for-action | 91.7 |
efficient-action-recognition-using-confidence | 91.2 |
can-spatiotemporal-3d-cnns-retrace-the | 94.5 |
bidirectional-cross-modal-knowledge | 98.8 |
video-classification-with-finecoarse-networks | 97.6 |
appearance-and-relation-networks-for-video | 94.3 |
towards-good-practices-for-very-deep-two | 91.4 |
omni-sourced-webly-supervised-learning-for | 98.6 |
two-stream-convolutional-networks-for-action | 88.0 |
distinit-learning-video-representations | 85.8 |
d3d-distilled-3d-networks-for-video-action | 97 |
d3d-distilled-3d-networks-for-video-action | 97.6 |
cooperative-cross-stream-network-for | 97.4 |
bubblenet-a-disperse-recurrent-structure-to | 97.62 |
optical-flow-guided-feature-a-fast-and-robust | 96 |
ts-lstm-and-temporal-inception-exploiting | 94.1 |
a2-nets-double-attention-networks | 96.4 |
a-closer-look-at-spatiotemporal-convolutions | 97.3 |
mars-motion-augmented-rgb-stream-for-action | 97.8 |
mars-motion-augmented-rgb-stream-for-action | 95.8 |
actionflownet-learning-motion-representation | 83.9 |
quo-vadis-action-recognition-a-new-model-and | 95.6 |
Modell 65 | 35.2 |
end-to-end-learning-of-motion-representation | 95.4 |
rethinking-spatiotemporal-feature-learning | 96.8 |
a-closer-look-at-spatiotemporal-convolutions | 96.8 |
federated-self-supervised-learning-for-video | - |
enhancing-video-transformers-for-action | 99.7 |
perf-net-pose-empowered-rgb-flow-net | 98.6 |
quo-vadis-action-recognition-a-new-model-and | 96.5 |
d3d-distilled-3d-networks-for-video-action | 97.1 |
zeroi2v-zero-cost-adaptation-of-pre-trained | 98.6 |
beyond-short-snippets-deep-networks-for-video | 88.6 |
smart-frame-selection-for-action-recognition | 98.64 |
learning-spatio-temporal-representations-with | 95.2 |
a-closer-look-at-spatiotemporal-convolutions | 95 |
holistic-large-scale-video-understanding | 97.8 |
videomoco-contrastive-video-representation | 78.7 |
video-action-recognition-collaborative | 86.1 |
convnet-architecture-search-for | 85.8 |
videomoco-contrastive-video-representation | 74.1 |
an-image-is-worth-16x16-words-what-is-a-video | 97 |
temporal-segment-networks-towards-good | 94.2 |
r-stan-residual-spatial-temporal-attention | 91.5 |
towards-universal-representation-for-unseen | 42.5 |
transferring-textual-knowledge-for-visual | 98.2 |
dance-with-flow-two-in-one-stream-action | 92 |
videomae-v2-scaling-video-masked-autoencoders | 99.6 |