HyperAI

摘要

近期在主动说话人检测（Active Speaker Detection, ASD）问题上的进展建立在一个两阶段的过程之上：特征提取和时空上下文聚合。本文提出了一种端到端的ASD工作流程，其中特征学习和上下文预测是联合进行的。我们的端到端可训练网络同时学习多模态嵌入并聚合时空上下文，从而生成更适合的特征表示，提高了ASD任务的性能。我们还引入了交错图神经网络（interleaved Graph Neural Network, iGNN）模块，这些模块根据ASD问题的主要上下文来源对消息传递进行分割。实验表明，iGNN模块聚合的特征更适合ASD任务，达到了当前最佳性能。最后，我们设计了一种弱监督策略，证明了ASD问题也可以通过利用音视频数据但仅依赖音频注释来解决。我们通过建模音频信号与可能的声音源（说话人）之间的直接关系，并引入对比损失函数来实现这一点。本项目的全部资源将在以下网址公开：https://github.com/fuankarion/end-to-end-asd。

摘要

Juan León Alcázar Moritz Cordes Chen Zhao Bernard Ghanem

摘要

用 AI 构建 AI

HyperAI Newsletters

Juan León Alcázar Moritz Cordes Chen Zhao Bernard Ghanem

摘要

用 AI 构建 AI

HyperAI Newsletters

Juan León Alcázar Moritz Cordes Chen Zhao Bernard Ghanem

摘要

用 AI 构建 AI

HyperAI Newsletters

Command Palette

端到端主动说话人检测

Juan León Alcázar Moritz Cordes Chen Zhao Bernard Ghanem

摘要

用 AI 构建 AI

HyperAI Newsletters

Command Palette

端到端主动说话人检测

Juan León Alcázar Moritz Cordes Chen Zhao Bernard Ghanem

摘要

用 AI 构建 AI

HyperAI Newsletters

Command Palette

端到端主动说话人检测

Juan León Alcázar Moritz Cordes Chen Zhao Bernard Ghanem

摘要

用 AI 构建 AI

HyperAI Newsletters