Zero Shot Video Question Answer On Egoschema
Metriken
Accuracy
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Vergleichstabelle
Modellname | Accuracy |
---|---|
mvbench-a-comprehensive-multi-modal-video | 65.6 |
understanding-long-videos-in-one-multimodal | 60.3 |
Modell 3 | 20.0 |
language-repository-for-long-video | 66.2 |
a-simple-llm-framework-for-long-range-video | 50.8 |
slowfast-llava-a-strong-training-free | 47.2 |
a-simple-llm-framework-for-long-range-video | 57.6 |
tarsier-recipes-for-training-and-evaluating-1 | 68.6 |
self-chained-image-language-model-for-video-1 | 25.7 |
too-many-frames-not-all-useful-efficient | 66.0 |
ts-llava-constructing-visual-tokens-through | 57.8 |
videotree-adaptive-tree-based-video | 66.2 |
mvbench-a-comprehensive-multi-modal-video | 63.6 |