Dense Video Captioning On Youcook2
评估指标
ROUGE-L
评测结果
各个模型在此基准测试上的表现结果
模型名称 | ROUGE-L | Paper Title | Repository |
---|---|---|---|
E2vidD6-MASSalign-BiD | 39.03 | Multimodal Pretraining for Dense Video Captioning | |
Vid2Seq (HowTo100M+VidChapters-7M PT) | - | - | - |
CM² | - | Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval | |
PDVC (TSN features, no SCST) | - | End-to-End Dense Video Captioning with Parallel Decoding | |
HiCM² | - | HiCM$^2$: Hierarchical Compact Memory Modeling for Dense Video Captioning | - |
GVL | - | Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos | |
Vid2Seq | - | Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning |
0 of 7 row(s) selected.