HyperAI

Retrieval-augmented Few-shot In-context Audio Captioning is an audio description generation technique that leverages the principle of few-shot in-context learning. During inference, it generates accurate and contextually appropriate textual descriptions by retrieving a few relevant examples from the training data, without the need for large-scale training on specific datasets, thereby achieving efficient and flexible audio content understanding and labeling.

AudioCaps

Audio Flamingo (4-shot)

HyperAI

AudioCaps

Audio Flamingo (4-shot)

Command Palette

Retrieval-augmented Few-shot In-context Audio Captioning

Command Palette

Retrieval-augmented Few-shot In-context Audio Captioning

Command Palette

Retrieval-augmented Few-shot In-context Audio Captioning