Search for a command to run...
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models