HyperAIHyperAI

Command Palette

Search for a command to run...

Long-form music generation with latent diffusion

Zach Evans Julian D. Parker CJ Carr Zack Zukowski Josiah Taylor Jordi Pons

Abstract

Audio-based generative models for music have seen great strides recently, butso far have not managed to produce full-length music tracks with coherentmusical structure. We show that by training a generative model on long temporalcontexts it is possible to produce long-form music of up to 4m45s. Our modelconsists of a diffusion-transformer operating on a highly downsampledcontinuous latent representation (latent rate of 21.5Hz). It obtainsstate-of-the-art generations according to metrics on audio quality and promptalignment, and subjective tests reveal that it produces full-length music withcoherent structure.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Long-form music generation with latent diffusion | Papers | HyperAI