8 months ago

Multimodal Representation

Jade Copet Felix Kreuk Itai Gat Tal Remez David Kant Gabriel Synnaeve Yossi Adi Alexandre Défossez

Abstract

We tackle the task of conditional music generation. We introduce MusicGen, asingle Language Model (LM) that operates over several streams of compresseddiscrete music representation, i.e., tokens. Unlike prior work, MusicGen iscomprised of a single-stage transformer LM together with efficient tokeninterleaving patterns, which eliminates the need for cascading several models,e.g., hierarchically or upsampling. Following this approach, we demonstrate howMusicGen can generate high-quality samples, both mono and stereo, while beingconditioned on textual description or melodic features, allowing bettercontrols over the generated output. We conduct extensive empirical evaluation,considering both automatic and human studies, showing the proposed approach issuperior to the evaluated baselines on a standard text-to-music benchmark.Through ablation studies, we shed light over the importance of each of thecomponents comprising MusicGen. Music samples, code, and models are availableat https://github.com/facebookresearch/audiocraft

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Multimodal Representation

Jade Copet Felix Kreuk Itai Gat Tal Remez David Kant Gabriel Synnaeve Yossi Adi Alexandre Défossez

Abstract

We tackle the task of conditional music generation. We introduce MusicGen, asingle Language Model (LM) that operates over several streams of compresseddiscrete music representation, i.e., tokens. Unlike prior work, MusicGen iscomprised of a single-stage transformer LM together with efficient tokeninterleaving patterns, which eliminates the need for cascading several models,e.g., hierarchically or upsampling. Following this approach, we demonstrate howMusicGen can generate high-quality samples, both mono and stereo, while beingconditioned on textual description or melodic features, allowing bettercontrols over the generated output. We conduct extensive empirical evaluation,considering both automatic and human studies, showing the proposed approach issuperior to the evaluated baselines on a standard text-to-music benchmark.Through ablation studies, we shed light over the importance of each of thecomponents comprising MusicGen. Music samples, code, and models are availableat https://github.com/facebookresearch/audiocraft

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp