HyperAIHyperAI

AudioBox-Aesthetics Audio Aesthetics Evaluation Demo

1. Tutorial Introduction

GitHub Stars

This tutorial uses resources for a single RTX 4090 card.

2. Effect Examples

Evaluation Dimensionsillustrate
Production Quality (PQ)Focus on the technical aspects of quality rather than subjective quality. Including audio clarity, fidelity, dynamic range, frequency and spatialization
Production Complexity (PC)Focus on the complexity of the audio scene, measured by the number of audio components
Content enjoyment (CE)Focus on the subjective quality of audio works, covering open dimensions such as emotional impact, artistic skills, artistic expression and subjective experience
Content usefulness (CU)Evaluate the possibility of audio as a material for content creation from a subjective dimension

3. Operation steps

1. After starting the container, click the API address to enter the Gradio interactive interface

2. Once you enter the webpage, you can use the model

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.

Precautions

  • To ensure optimal performance, we recommend uploading audio files ≤ 10 MB and ≤ 60 seconds in length.
  • Complex audio content, such as multi-instrument symphonies, may require longer evaluation time.
  • If the evaluation fails, check the file format or try shortening the audio clip.

Citation Information

The citation information for this project is as follows:

@article{tjandra2025aes,
    title={Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound},
    author={Andros Tjandra and Yi-Chiao Wu and Baishan Guo and John Hoffman and Brian Ellis and Apoorv Vyas and Bowen Shi and Sanyuan Chen and Matt Le and Nick Zacharov and Carleigh Wood and Ann Lee and Wei-Ning Hsu},
    year={2025},
    url={https://arxiv.org/abs/2502.05139}
}