CoCap (ViT/L14) | 44.4 | 57.2 | 30.3 | 63.4 | Accurate and Fast Compressed Video Captioning | |
IcoCap (ViT-B/16) | 47.0 | 60.2 | 31.1 | 64.9 | IcoCap: Improving Video Captioning by Compounding Images | - |
IcoCap (ViT-B/32) | 46.1 | 59.1 | 30.3 | 64.3 | IcoCap: Improving Video Captioning by Compounding Images | - |