HyperAI

Weekly Editor's Picks | Hillshell Voice Dataset Series Launched, Chinese Scholars Develop Breast Cancer Prognosis Scoring System MIRS

a year ago
Information
zhaorui
特色图像

🏮During the Spring Festival, we travel thousands of miles to reunite with our families

🏮Fireworks on earth, strong New Year atmosphere, family fun

Farewell to the Jade Rabbit, and welcome the Spring with the Golden Dragon. Tomorrow night is New Year's Eve!HyperAI would like to wish everyone a happy Chinese New Year in advance ~ May you all be blessed with a happy Chinese New Year!This week, the official website of hyper.ai launched the thousands of hours of Chinese speech database opened by Hill Beike, opening the door to your voice.

From February 5 to February 8, hyper.ai official website updates:

* High-quality public datasets: 8

* AI4S paper cases: 2

* Popular encyclopedia entries: 8

Visit the official website:hyper.ai

Selected public datasets

1. AISHELL-1 Open Source Chinese Speech Database

This dataset was recorded by 400 people from different regions of China with different accents. After being transcribed and annotated by professional voice proofreaders and passing strict quality inspection, the accuracy of the text in this database is above 95%. It is divided into training set, development set, and test set.

Direct use:

https://hyper.ai/datasets/29344

2. AISHELL-2 Chinese Speech Database

Hillshell Chinese Mandarin Voice Database AISHELL-2 contains 1,000 hours of voice recordings, covering 12 fields including wake-up words, voice control words, smart home, driverless driving, and industrial production.

Direct use:

https://hyper.ai/datasets/29347

3. AISHELL-3 High-fidelity Chinese speech database

This dataset was recorded by 218 people from different accent regions in China. Professional voice proofreaders annotated the pinyin and rhythm, and passed strict quality inspection. The accuracy of the phonetic characters in this database is above 98%.

Direct use:

https://hyper.ai/datasets/29352

4. AISHELL-4 Multi-channel Chinese Conference Speech Database

AISHELL-4 consists of 211 recorded conference sessions, each with 4 to 8 speakers, with a total duration of 120 hours, which can be used for individual tasks such as speech front-end processing and speech recognition.

Direct use:

https://hyper.ai/datasets/29375

5. AISHELL-WakeUp-1 Chinese and English wake-up word voice database

This dataset invited 254 speakers to participate in the recording, with a total of nearly 4 million wake-up word voices and 1,561.12 hours. The recording text is the wake-up word "Hello, Mia" and "hi, mia". This database has been transcribed and annotated by professional voice proofreaders and has passed strict quality inspections. It can be used for research such as voiceprint recognition and voice wake-up recognition.

Direct use:

https://hyper.ai/datasets/29186

6. AISHELL-DMASH Chinese Mandarin Microphone Array Home Scene Speech Database

The AISHELL-DMASH dataset was recorded in real smart home scenarios in two different rooms. The dataset contains 30,000 hours of speech data. The dataset was transcribed by professional speech annotators with a word accuracy of 98%, and can be used for research such as voiceprint recognition, speech recognition, and wake-up word recognition.

Direct use:

https://hyper.ai/datasets/29380

7.DeepSymNet Deep Symbol Network Dataset

This is a new symbolic network called DeepSymNet proposed by researchers from the Institute of Semiconductors, Chinese Academy of Sciences to represent symbolic expressions and to be used for symbolic regression.

Direct use:

https://hyper.ai/datasets/29321

8. Evol Instruct Chinese GPT4 text dataset

This dataset was created in the following way:

(1) Translate the English questions of Evol-instruct-70k into Chinese;

(2) Request GPT4 to generate Chinese answers.

Direct use:

https://hyper.ai/datasets/29318

ScienceAI  Selected Case Studies

1.  Aiming at the world's most common cancer, Chinese scholars established the breast cancer prognostic scoring system MIRS

Recently, Chinese scholars used a neural network model to establish a scoring system MIRS for predicting breast cancer prognosis and treatment, which can be used to guide the formulation of treatment strategies for breast cancer patients.iScience"Journal.

View the full report:

https://hyper.ai/news/29304

2. Shenzhen Institute of Advanced Technology of Chinese Academy of Sciences proposed SBeA, which analyzes animal social behavior based on a few-shot learning framework

Animal behavior research urgently needs to improve research efficiency and accuracy through technological innovation. Based on this, SBeA (Social Behavior Atlas) came into being. Developed by the Shenzhen Institute of Advanced Technology of the Chinese Academy of Sciences, it can comprehensively quantify the behavior of free-living animals and use a small number of labeled frames (about 400 frames) to perform multi-animal 3DPose EstimationThrough the two-way transfer learning strategy, the accuracy of multi-animal identity recognition exceeds 90%. The relevant results have been published in the journal "Nature".

View the full report:

https://hyper.ai/news/29353

Popular Encyclopedia Articles

1. Floating-point operations per second FLOPS

2. Random Walk

3. Virtual Screening

4. Music Information Retrieval (MIR)

5. Quantum Neural Network

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://hyper.ai/wiki

The above is all the content of this week’s editor’s selection. If you have resources that you would like to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!

As the Chinese New Year approaches, HyperAI once again wishes everyone good luck, prosperity, and a happy and fulfilling life!In the new year, we will bring you more surprises!

See you in the Year of the Dragon!

About HyperAI

HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:

* Provide domestic accelerated download nodes for 1200+ public data sets

* Includes 300+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai/