Search for a command to run...
Audio-Visual Representation Learning durch Knowledge Distillation aus Speech Foundation Models