HyperAI

Weekly Editor's Picks | CCMusic Music Dataset Is Online, Revealing NVIDIA's Self-developed Large Model ChipNeMo

a year ago
Information
Yudi
特色图像

To make AI more efficient in music training,CCMusic has opened up some music and audio datasets for free use by computational musicology researchers.Now available on hyper.ai. In addition, hyper.ai has also updated relevant music datasets such as miHoYo and NetEase Cloud Music. Let’s take a look!

From January 22nd to January 26th, hyper.ai official website updates:

* High-quality public datasets: 10

* AI4S paper cases: 2

* Popular encyclopedia entries: 10

Visit the official website:hyper.ai

Selected public datasets

1CCMUSIC True and False Voice Dataset

This dataset contains 1280 monophonic singing audios (.wav format) in chest voice and falsetto. Chest voice is marked as chest voice and falsetto is marked as falsetto.

Direct use:

https://hyper.ai/datasets/29125

2. CCMUSIC Piano Sound Quality Dataset

The dataset contains 12 gamut audio files (.wav / .mp3 / .m4a format) and 1320 split single-tone audio files (.wav / .mp3 / .m4a format) of 7 pianos in the piano room of the China Conservatory of Music (Kawai upright piano, Kawai grand piano, Yongchang upright piano, Xinghai upright piano, Steinway Grand Theater grand piano, Steinway grand piano and Pearl River upright piano), totaling 1332 files. In addition, there is a piano sound quality subjective evaluation questionnaire (.xls format), including the scores of 29 participants in the subjective evaluation of piano sound quality.

Direct use:

https://hyper.ai/datasets/29097

3. CCMUSIC music genre dataset

The dataset contains about 1,700 pieces of music (.mp3 format), with a length of 270-300 seconds, divided into 17 genres. Due to copyright issues of the original music, only spectrograms are provided in the dataset.

Direct use:

https://hyper.ai/datasets/29094

4. CCMUSIC Bel Canto National Singing Dataset

This dataset contains hundreds of a cappella recordings sung by professional singers in two styles: Bel Conto and Chinese folk singing. All clips are sung by professional singers and recorded in professional commercial recording studios.

Direct use:

https://hyper.ai/datasets/29086

5. NetEase Cloud Music Sentiment Classification Dataset

This dataset contains about 395,000 music emotion label data, each of which consists of three main columns: song ID, playlist ID, and song emotion label. The source of this data is the official website of NetEase Cloud Music, which provides detailed information on the labeling of song emotions. Due to the large size of the dataset, it is suitable for building sentiment analysis models, conducting data mining, and deeply understanding the relationship between music and emotion.

Direct use:

https://hyper.ai/datasets/29133

6. miHoYo Music Remix Piano Dataset

This dataset mainly contains miHoYo  Piano music clips from two of its games, "Genshin Impact" and "Honkai: Star Rail". These piano clips have been converted into ABC music scores. Researchers can use this resource to deeply analyze musical features such as notes and melody structures, providing substantial data support for the training and enhancement of music generation algorithms.

Direct use:

https://hyper.ai/datasets/29150

7. FMA Music Analysis Dataset

FMA is a music analysis dataset consisting of full HQ audio, pre-computed features, as well as track and user-level metadata, which can be used to evaluate multiple tasks in MIR (Music Information Retrieva).

Direct use:

https://hyper.ai/datasets/29162

8. High-Throughput Algae Cell Detection Algae Cell Detection Dataset

This dataset comes from the 2023 IEEE Web Informatics Conference "Vision Meets Algae" object detection challenge, including training sets and test sets. The training set contains 700 images and the test set contains 300 images, divided into 6 categories. The training set is annotated in YOLO format, and each image has a corresponding .txt annotation file.

Direct use:

https://hyper.ai/datasets/29158

9. MathVista Mathematical Reasoning Dataset

MathVista is a comprehensive mathematical reasoning benchmark in a visual environment. It consists of three newly created datasets, IQTest, FunctionQA, and PaperQA, which are designed to evaluate logical reasoning on jigsaw test graphs, algebraic reasoning on function graphs, and scientific reasoning on academic paper graphs, respectively. In total, MathVista contains 6,141 examples collected from 31 different datasets.

Direct use:

https://hyper.ai/datasets/29122

10. Animals 10 kinds of animal image dataset

This dataset contains about 28K medium-quality animal images belonging to 10 categories: dog, cat, horse, spider, butterfly, chicken, sheep, cow, squirrel, elephant. It can be used to test different image recognition networks.

Direct use:

https://hyper.ai/datasets/29079

ScienceAI  Selected Case Studies

1. AI empowers green cooling, Lingnan University of Hong Kong develops DEMMFL model for building cooling load prediction

Researchers from Lingnan University and City University of Hong Kong proposed a new dynamic engineered multimodal feature learning (DEMMFL) model in the "Global Artificial Intelligence Challenge for Building Mechanical and Electrical Facilities", which can accurately predict building cooling loads and help save energy.Applied Energy"Journal.

View the full report:

https://hyper.ai/news/29108

2. Roll yourself up? Nvidia releases a large model ChipNeMo, specially designed for chip design

NVIDIA has released a custom large language model, ChipNeMo, trained based on its own internal data to help engineers complete tasks related to chip design. This article is a detailed introduction to ChipNeMo.

View the full report:

https://hyper.ai/news/29134

Popular Encyclopedia Articles

1. Nuclear Norm

2. Paired t-Test

3. Distributed Computing

4. Mixture of Experts (MoE)

5. Retrieval-Augmented Generation (RAG)

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:https://hyper.ai/wiki

The above is all the content of this week’s editor’s selection. If you have resources that you would like to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!

See you next week!

About HyperAI

HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:

* Provide domestic accelerated download nodes for 1200+ public data sets

* Includes 300+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:https://hyper.ai/