HyperAI

摘要

我们探讨了音频描述（Audio Captioning）这一问题：即为任意类型的野外音频生成自然语言描述，这一课题在以往研究中却鲜有涉及。为此，我们构建了一个大规模数据集，包含46,000段音频片段及其由众包方式收集的人工撰写的文本描述对，数据来源为AudioSet数据集。通过系统的实证研究，我们不仅验证了所收集的描述文本确实与音频输入高度一致，还揭示了在音频描述任务中，哪些音频表示方式与描述模型具有较好的有效性。基于大量实验，我们进一步提出了两个新颖的组件，以提升音频描述性能：自上而下的多尺度编码器（top-down multi-scale encoder）以及对齐语义注意力机制（aligned semantic attention）。

摘要

Chris Dongjoo Kim Byeongchang Kim Hyunmin Lee Gunhee Kim

摘要

用 AI 构建 AI

HyperAI Newsletters

Chris Dongjoo Kim Byeongchang Kim Hyunmin Lee Gunhee Kim

摘要

用 AI 构建 AI

HyperAI Newsletters

Chris Dongjoo Kim Byeongchang Kim Hyunmin Lee Gunhee Kim

摘要

用 AI 构建 AI

HyperAI Newsletters

Command Palette

AudioCaps：为野外音频生成字幕

Chris Dongjoo Kim Byeongchang Kim Hyunmin Lee Gunhee Kim

摘要

用 AI 构建 AI

HyperAI Newsletters

Command Palette

AudioCaps：为野外音频生成字幕

Chris Dongjoo Kim Byeongchang Kim Hyunmin Lee Gunhee Kim

摘要

用 AI 构建 AI

HyperAI Newsletters

Command Palette

AudioCaps：为野外音频生成字幕

Chris Dongjoo Kim Byeongchang Kim Hyunmin Lee Gunhee Kim

摘要

用 AI 构建 AI

HyperAI Newsletters