Video Object Tracking On Cater
Metrics
L1
Top 1 Accuracy
Top 5 Accuracy
Results
Performance results of various models on this benchmark
Model Name | L1 | Top 1 Accuracy | Top 5 Accuracy | Paper Title | Repository |
---|---|---|---|---|---|
Hopper | 0.85 | 73.2 | 93.8 | Hopper: Multi-hop Transformer for Spatiotemporal Reasoning | |
Inferno | - | 71.7 | 88.9 | INFERNO: Inferring Object-Centric 3D Scene Representations without Supervision | - |
TFCNet | 0.47 | 79.7 | 95.5 | TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning | - |
I3D-50 + LSTM | 1.2 | 60.2 | 81.8 | Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset | |
Loci | 0.14 | 90.7 | 98.5 | Learning What and Where: Disentangling Location and Identity Tracking Without Supervision | |
OPNet | 0.54 | 74.8 | - | Learning Object Permanence from Video | |
Aloe | 0.44 | 74.0 | 94.0 | Attention over learned object embeddings enables complex visual reasoning |
0 of 7 row(s) selected.