Dense Video Captioning On Vitt
Metrics
CIDEr
METEOR
SODA
Results
Performance results of various models on this benchmark
Model Name | CIDEr | METEOR | SODA | Paper Title | Repository |
---|---|---|---|---|---|
HiCM² | 51.2 | 9.6 | 0.150 | HiCM$^2$: Hierarchical Compact Memory Modeling for Dense Video Captioning | - |
Vid2Seq | 43.5 | 8.5 | 0.135 | Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning | |
Vid2Seq (VidChapters-7M PT) | 50.9 | 9.5 | 0.151 | - | - |
E2ESG | 25.0 | 8.1 | - | End-to-end Dense Video Captioning as Sequence Generation | - |
0 of 4 row(s) selected.