6 months ago

Audio Classification

Audio and Speech Processing

Kai Yu Mengyue Wu Zeyu Xie Xuenan Xu

Abstract

This report proposes an audio captioning system for the Detectionand Classification of Acoustic Scenes and Events (DCASE) 2021challenge task Task 6. Our audio captioning system consists of a10-layer convolution neural network (CNN) encoder and a tempo-ral attentional single layer gated recurrent unit (GRU) decoder. Inthis challenge, there is no restriction on the usage of external dataand pre-trained models. To better model the concepts in an audioclip, we pre-train the CNN encoder with audio tagging on AudioSet.After standard cross entropy based training, we further fine-tune themodel with reinforcement learning to directly optimize the evalua-tion metric. Experiments show that our proposed system achieves aSPIDEr of 28.6 on the public evaluation split without ensemble1.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

6 months ago

Audio Classification

Audio and Speech Processing

Kai Yu Mengyue Wu Zeyu Xie Xuenan Xu

Abstract

This report proposes an audio captioning system for the Detectionand Classification of Acoustic Scenes and Events (DCASE) 2021challenge task Task 6. Our audio captioning system consists of a10-layer convolution neural network (CNN) encoder and a tempo-ral attentional single layer gated recurrent unit (GRU) decoder. Inthis challenge, there is no restriction on the usage of external dataand pre-trained models. To better model the concepts in an audioclip, we pre-train the CNN encoder with audio tagging on AudioSet.After standard cross entropy based training, we further fine-tune themodel with reinforcement learning to directly optimize the evalua-tion metric. Experiments show that our proposed system achieves aSPIDEr of 28.6 on the public evaluation split without ensemble1.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

THE SJTU SYSTEM FOR DCASE2021 CHALLENGE TASK 6: AUDIO CAPTIONING BASED ON ENCODER PRE-TRAINING AND REINFORCEMENT LEARNING | Papers | HyperAI