MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations

Contrastive self-supervised learning has gained attention for its ability tocreate high-quality representations from large unlabelled data sets. A keyreason that these powerful features enable data-efficient learning ofdownstream tasks is that they provide augmentation invariance, which is often auseful inductive bias. However, the amount and type of invariances preferred isnot known apriori, and varies across different downstream tasks. We thereforepropose a multi-task self-supervised framework (MT-SLVR) that learns bothvariant and invariant features in a parameter-efficient manner. Our multi-taskrepresentation provides a strong and flexible feature that benefits diversedownstream tasks. We evaluate our approach on few-shot classification tasksdrawn from a variety of audio domains and demonstrate improved classificationperformance on all of them