Weekly Editor's Picks | CCMusic Music Dataset Is Online, Revealing NVIDIA's self-developed Large Model ChipNeMo

To make AI more efficient in music training,CCMusic has opened up some music and audio datasets for free use by computational musicology researchers.Now available on hyper.ai. In addition, hyper.ai has also updated relevant music datasets such as miHoYo and NetEase Cloud Music. Let’s take a look!

From January 22nd to January 26th, hyper.ai official website updates:

* High-quality public datasets: 10

* AI4S paper cases: 2

* Popular encyclopedia entries: 10

Visit the official website:hyper.ai

Selected public datasets

1. CCMUSIC True and False Voice Dataset

This dataset contains 1280 monophonic singing audios (.wav format) in chest voice and falsetto. Chest voice is marked as chest voice and falsetto is marked as falsetto.

Direct use:

https://hyper.ai/datasets/29125

2. CCMUSIC Piano Sound Quality Dataset

The dataset contains 12 gamut audio files (.wav / .mp3 / .m4a format) and 1320 split single-tone audio files (.wav / .mp3 / .m4a format) of 7 pianos in the piano room of the China Conservatory of Music (Kawai upright piano, Kawai grand piano, Yongchang upright piano, Xinghai upright piano, Steinway Grand Theater grand piano, Steinway grand piano and Pearl River upright piano), totaling 1332 files. In addition, there is a piano sound quality subjective evaluation questionnaire (.xls format), including the scores of 29 participants in the subjective evaluation of piano sound quality.

Direct use:

https://hyper.ai/datasets/29097

3. CCMUSIC music genre dataset

The dataset contains about 1,700 pieces of music (.mp3 format), with a length of 270-300 seconds, divided into 17 genres. Due to copyright issues of the original music, only spectrograms are provided in the dataset.

Direct use:

https://hyper.ai/datasets/29094

4. CCMUSIC Bel Canto National Singing Dataset

This dataset contains hundreds of a cappella recordings sung by professional singers in two styles: Bel Conto and Chinese folk singing. All clips are sung by professional singers and recorded in professional commercial recording studios.

Direct use:

https://hyper.ai/datasets/29086

5. NetEase Cloud Music Sentiment Classification Dataset

This dataset contains about 395,000 music emotion label data, each of which consists of three main columns: song ID, playlist ID, and song emotion label. The source of this data is the official website of NetEase Cloud Music, which provides detailed information on the labeling of song emotions. Due to the large size of the dataset, it is suitable for building sentiment analysis models, conducting data mining, and deeply understanding the relationship between music and emotion.

Direct use:

https://hyper.ai/datasets/29133

6. miHoYo Music Remix Piano Dataset

This dataset mainly contains miHoYo Piano music clips from two of its games, "Genshin Impact" and "Honkai: Star Rail". These piano clips have been converted into ABC music scores. Researchers can use this resource to deeply analyze musical features such as notes and melody structures, providing substantial data support for the training and enhancement of music generation algorithms.

Direct use:

https://hyper.ai/datasets/29150

7. FMA Music Analysis Dataset

FMA is a music analysis dataset consisting of full HQ audio, pre-computed features, as well as track and user-level metadata, which can be used to evaluate multiple tasks in MIR (Music Information Retrieva).

Direct use:

https://hyper.ai/datasets/29162

8. High-Throughput Algae Cell Detection Algae Cell Detection Dataset

This dataset comes from the 2023 IEEE Web Informatics Conference "Vision Meets Algae" object detection challenge, including training sets and test sets. The training set contains 700 images and the test set contains 300 images, divided into 6 categories. The training set is annotated in YOLO format, and each image has a corresponding .txt annotation file.

Direct use:

https://hyper.ai/datasets/29158

9. MathVista Mathematical Reasoning Dataset

MathVista is a comprehensive mathematical reasoning benchmark in a visual environment. It consists of three newly created datasets, IQTest, FunctionQA, and PaperQA, which are designed to evaluate logical reasoning on jigsaw test graphs, algebraic reasoning on function graphs, and scientific reasoning on academic paper graphs, respectively. In total, MathVista contains 6,141 examples collected from 31 different datasets.

Direct use:

https://hyper.ai/datasets/29122

10. Animals 10 kinds of animal image dataset

This dataset contains about 28K medium-quality animal images belonging to 10 categories: dog, cat, horse, spider, butterfly, chicken, sheep, cow, squirrel, elephant. It can be used to test different image recognition networks.

Direct use:

https://hyper.ai/datasets/29079

ScienceAI Selected Case Studies

1. AI empowers green cooling, Lingnan University of Hong Kong develops DEMMFL model for building cooling load prediction

Researchers from Lingnan University and City University of Hong Kong proposed a new dynamic engineered multimodal feature learning (DEMMFL) model in the "Global Artificial Intelligence Challenge for Building Mechanical and Electrical Facilities", which can accurately predict building cooling loads and help save energy.Applied Energy"Journal.

View the full report:

https://hyper.ai/news/29108

2. Roll yourself up? Nvidia releases a large model ChipNeMo, specially designed for chip design

NVIDIA has released a custom large language model, ChipNeMo, trained based on its own internal data to help engineers complete tasks related to chip design. This article is a detailed introduction to ChipNeMo.

View the full report:

https://hyper.ai/news/29134

Weekly Editor's Picks | CCMusic Music Dataset Is Online, Revealing NVIDIA's self-developed Large Model ChipNeMo

2 years ago

Information

AI for Science

Dataset

From January 22nd to January 26th, hyper.ai official website updates:

* High-quality public datasets: 10

* AI4S paper cases: 2

* Popular encyclopedia entries: 10

Visit the official website:hyper.ai

Selected public datasets

1. CCMUSIC True and False Voice Dataset

This dataset contains 1280 monophonic singing audios (.wav format) in chest voice and falsetto. Chest voice is marked as chest voice and falsetto is marked as falsetto.

Direct use:

https://hyper.ai/datasets/29125

2. CCMUSIC Piano Sound Quality Dataset

Direct use:

https://hyper.ai/datasets/29097

3. CCMUSIC music genre dataset

Direct use:

https://hyper.ai/datasets/29094

4. CCMUSIC Bel Canto National Singing Dataset

Direct use:

https://hyper.ai/datasets/29086

5. NetEase Cloud Music Sentiment Classification Dataset

Direct use:

https://hyper.ai/datasets/29133

6. miHoYo Music Remix Piano Dataset

Direct use:

https://hyper.ai/datasets/29150

7. FMA Music Analysis Dataset

Direct use:

https://hyper.ai/datasets/29162

8. High-Throughput Algae Cell Detection Algae Cell Detection Dataset

Direct use:

https://hyper.ai/datasets/29158

9. MathVista Mathematical Reasoning Dataset

Direct use:

https://hyper.ai/datasets/29122

10. Animals 10 kinds of animal image dataset

Direct use:

https://hyper.ai/datasets/29079

ScienceAI Selected Case Studies

1. AI empowers green cooling, Lingnan University of Hong Kong develops DEMMFL model for building cooling load prediction

View the full report:

https://hyper.ai/news/29108

2. Roll yourself up? Nvidia releases a large model ChipNeMo, specially designed for chip design

View the full report:

https://hyper.ai/news/29134

Weekly Editor's Picks | CCMusic Music Dataset Is Online, Revealing NVIDIA's self-developed Large Model ChipNeMo

2 years ago

Information

AI for Science

Dataset

From January 22nd to January 26th, hyper.ai official website updates:

* High-quality public datasets: 10

* AI4S paper cases: 2

* Popular encyclopedia entries: 10

Visit the official website:hyper.ai

Selected public datasets

1. CCMUSIC True and False Voice Dataset

This dataset contains 1280 monophonic singing audios (.wav format) in chest voice and falsetto. Chest voice is marked as chest voice and falsetto is marked as falsetto.

Direct use:

https://hyper.ai/datasets/29125

2. CCMUSIC Piano Sound Quality Dataset

Direct use:

https://hyper.ai/datasets/29097

3. CCMUSIC music genre dataset

Direct use:

https://hyper.ai/datasets/29094

4. CCMUSIC Bel Canto National Singing Dataset

Direct use:

https://hyper.ai/datasets/29086

5. NetEase Cloud Music Sentiment Classification Dataset

Direct use:

https://hyper.ai/datasets/29133

6. miHoYo Music Remix Piano Dataset

Direct use:

https://hyper.ai/datasets/29150

7. FMA Music Analysis Dataset

Direct use:

https://hyper.ai/datasets/29162

8. High-Throughput Algae Cell Detection Algae Cell Detection Dataset

Direct use:

https://hyper.ai/datasets/29158

9. MathVista Mathematical Reasoning Dataset

Direct use:

https://hyper.ai/datasets/29122

10. Animals 10 kinds of animal image dataset

Direct use:

https://hyper.ai/datasets/29079

ScienceAI Selected Case Studies

1. AI empowers green cooling, Lingnan University of Hong Kong develops DEMMFL model for building cooling load prediction

View the full report:

https://hyper.ai/news/29108

2. Roll yourself up? Nvidia releases a large model ChipNeMo, specially designed for chip design

View the full report:

https://hyper.ai/news/29134

Command Palette

Weekly Editor's Picks | CCMusic Music Dataset Is Online, Revealing NVIDIA's self-developed Large Model ChipNeMo

Selected public datasets

ScienceAI Selected Case Studies

Popular Encyclopedia Articles

Command Palette

Weekly Editor's Picks | CCMusic Music Dataset Is Online, Revealing NVIDIA's self-developed Large Model ChipNeMo

Selected public datasets

ScienceAI Selected Case Studies

Popular Encyclopedia Articles

Related News

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Dataset Compilation | AI Agent Evaluation Datasets: 10 Datasets Released by Microsoft, Peking University, HKU, Shanghai Jiao Tong University, etc., Covering Everything From long-range Memory to real-world Task execution.

Dataset Compilation | From Medical imaging/clinical Data to Cell atlas/medical Q&A, 10 Major Datasets Covering Multiple Disease Scenarios

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Command Palette

Weekly Editor's Picks | CCMusic Music Dataset Is Online, Revealing NVIDIA's self-developed Large Model ChipNeMo

Selected public datasets

ScienceAI Selected Case Studies

Popular Encyclopedia Articles

Related News

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Dataset Compilation | AI Agent Evaluation Datasets: 10 Datasets Released by Microsoft, Peking University, HKU, Shanghai Jiao Tong University, etc., Covering Everything From long-range Memory to real-world Task execution.

Dataset Compilation | From Medical imaging/clinical Data to Cell atlas/medical Q&A, 10 Major Datasets Covering Multiple Disease Scenarios

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Related News

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Dataset Compilation | AI Agent Evaluation Datasets: 10 Datasets Released by Microsoft, Peking University, HKU, Shanghai Jiao Tong University, etc., Covering Everything From long-range Memory to real-world Task execution.

Dataset Compilation | From Medical imaging/clinical Data to Cell atlas/medical Q&A, 10 Major Datasets Covering Multiple Disease Scenarios

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Related News

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Dataset Compilation | AI Agent Evaluation Datasets: 10 Datasets Released by Microsoft, Peking University, HKU, Shanghai Jiao Tong University, etc., Covering Everything From long-range Memory to real-world Task execution.

Dataset Compilation | From Medical imaging/clinical Data to Cell atlas/medical Q&A, 10 Major Datasets Covering Multiple Disease Scenarios

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.