AdaFocus (Semi-weak, I3D-Charades-Pretrain-feature, D3G model) | 46.9 | 21.1 | 79.3 | 49.2 | Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition | - |
D3G (Semi-weak, MViT-K400-Pretrain-feature, evaluated by AdaFocus) | 46.0 | 20.2 | 83.1 | 50.2 | D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation | |
AdaFocus (Weak, I3D-Charades-Pretrain-feature, CPL model) | 49.1 | 22.4 | 84.2 | 51.8 | Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition | - |
CPL (Weak, MViT-K400-Pretrain-feature, evaluated by AdaFocus) | 47.8 | 21.8 | 84.6 | 50.4 | Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal Learning | |
D3G (Semi-weak, I3D-K400-Pretrain-feature, evaluated by AdaFocus) | 41.7 | 18.8 | 78.2 | 48.0 | D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation | |
AdaFocus (Full, I3D-Charades-Pretrain-feature, MMN model) | 56.7 | 35.6 | 87.9 | 65.0 | Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition | - |
AdaFocus (Full, MViT-Charades-Pretrain-feature, MMN model) | 62.4 | 38.6 | 89.4 | 66.4 | Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition | - |
CPL (Weak, I3D-K400-Pretrain-feature, evaluated by AdaFocus) | 39.6 | 18.6 | 81.4 | 49.2 | Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal Learning | |
MMN (Full, I3D-K400-Pretrain-feature, evaluated by AdaFocus) | 49.4 | 29.8 | 85.8 | 60.5 | Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding | |
MMN (Full, MViT-K400-Pretrain-feature, evaluated by AdaFocus) | 55.2 | 32.2 | 88.3 | 62.7 | Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding | |
AdaFocus (Semi-weak, MViT-Charades-Pretrain-feature, D3G model) | 50.1 | 21.8 | 86.1 | 54.6 | Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition | - |
AdaFocus (Weak, MViT-Charades-Pretrain-feature, CPL model) | 51.7 | 23.2 | 85.2 | 52.6 | Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition | - |