HyperAI
HyperAI초신경
홈
플랫폼
문서
뉴스
연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
서비스 약관
개인정보 처리방침
한국어
HyperAI
HyperAI초신경
Toggle Sidebar
전체 사이트 검색...
⌘
K
Command Palette
Search for a command to run...
플랫폼
홈
SOTA
행동 인식
Action Recognition On Epic Kitchens 100
Action Recognition On Epic Kitchens 100
평가 지표
Action@1
GFLOPs
Noun@1
Verb@1
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Action@1
GFLOPs
Noun@1
Verb@1
Paper Title
Avion (ViT-L)
54.4
-
65.4
73.0
Training a Large Video Model on a Single Machine in a Day
M&M (WTS 60M)
53.6
-
66.3
72.0
M&M Mix: A Multimodal Multiview Transformer Ensemble
LVMAE
52.1
-
61.8
75.0
Extending Video Masked Autoencoders to 128 frames
TAdaFormer-L/14
51.8
-
64.1
71.7
Temporally-Adaptive Models for Efficient Video Understanding
LaViLa (TimeSformer-L)
51
-
62.9
72
Learning Video Representations from Large Language Models
MTV-B (WTS 60M)
50.5
-
63.9
69.9
Multiview Transformers for Video Recognition
OMNIVORE (Swin-B, finetuned)
49.9
-
61.7
69.5
Omnivore: A Single Model for Many Visual Modalities
CAST-B/16
49.3
-
60.9
72.5
CAST: Cross-Attention in Space and Time for Video Action Recognition
TAdaConvNeXtV2-S
48.9
-
60.2
71.0
Temporally-Adaptive Models for Efficient Video Understanding
MeMViT-24
48.4
-
60.3
71.4
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
MMT
47.8
-
61.0
70.1
Multiscale Multimodal Transformer for Multimodal Action Recognition
MoViNet-A6
47.7
117x1
57.3
72.2
MoViNets: Mobile Video Networks for Efficient Video Recognition
AVT
47.2
-
59.3
70.4
AVT: Audio-Video Transformer for Multimodal Action Recognition
ORViT Mformer-L (ORViT blocks)
45.7
-
58.7
68.4
Object-Region Video Transformers
TempAgg
45.26
-
53.35
66
Technical Report: Temporal Aggregate Representations
MoViNet-A5
44.5
74.9x1
55.1
69.1
MoViNets: Mobile Video Networks for Efficient Video Recognition
Mformer-HR
44.5
-
58.5
67.0
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
GSF
44.48
-
53.18
69.06
Gate-Shift-Fuse for Video Action Recognition
MoViNet-A4
44.4
42.2x1
56.2
68.8
MoViNets: Mobile Video Networks for Efficient Video Recognition
Mformer-L
44.1
-
57.6
67.1
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
0 of 30 row(s) selected.
Previous
Next
Action Recognition On Epic Kitchens 100 | SOTA | HyperAI초신경