JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models

Music generation has attracted growing interest with the advancement of deepgenerative models. However, generating music conditioned on textualdescriptions, known as text-to-music, remains challenging due to the complexityof musical structures and high sampling rate requirements. Despite the task'ssignificance, prevailing generative models exhibit limitations in musicquality, computational efficiency, and generalization. This paper introducesJEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is adiffusion model incorporating both autoregressive and non-autoregressivetraining. Through in-context learning, JEN-1 performs various generation tasksincluding text-guided music generation, music inpainting, and continuation.Evaluations demonstrate JEN-1's superior performance over state-of-the-artmethods in text-music alignment and music quality while maintainingcomputational efficiency. Our demos are available athttps://jenmusic.ai/audio-demos