HyperAIHyperAI
2 months ago

COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

Denize, Julien ; Liashuha, Mykola ; Rabarisoa, Jaonary ; Orcesi, Astrid ; Hérault, Romain
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action
  Spotting using Transformers
Abstract

We present COMEDIAN, a novel pipeline to initialize spatiotemporaltransformers for action spotting, which involves self-supervised learning andknowledge distillation. Action spotting is a timestamp-level temporal actiondetection task. Our pipeline consists of three steps, with two initializationstages. First, we perform self-supervised initialization of a spatialtransformer using short videos as input. Additionally, we initialize a temporaltransformer that enhances the spatial transformer's outputs with global contextthrough knowledge distillation from a pre-computed feature bank aligned witheach short video segment. In the final step, we fine-tune the transformers tothe action spotting task. The experiments, conducted on the SoccerNet-v2dataset, demonstrate state-of-the-art performance and validate theeffectiveness of COMEDIAN's pretraining paradigm. Our results highlight severaladvantages of our pretraining pipeline, including improved performance andfaster convergence compared to non-pretrained models.

COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers | Latest Papers | HyperAI