Weekly Editor's Picks | Hillshell Voice Dataset Series Launched, Chinese Scholars Develop Breast Cancer Prognosis Scoring System MIRS

🏮During the Spring Festival, we travel thousands of miles to reunite with our families

🏮Fireworks on earth, strong New Year atmosphere, family fun

Farewell to the Jade Rabbit, and welcome the Spring with the Golden Dragon. Tomorrow night is New Year's Eve!HyperAI would like to wish everyone a happy Chinese New Year in advance ~ May you all be blessed with a happy Chinese New Year!This week, the official website of hyper.ai launched the thousands of hours of Chinese speech database opened by Hill Beike, opening the door to your voice.

From February 5 to February 8, hyper.ai official website updates:

* High-quality public datasets: 8

* AI4S paper cases: 2

* Popular encyclopedia entries: 8

Visit the official website:hyper.ai

Selected public datasets

1. AISHELL-1 Open Source Chinese Speech Database

This dataset was recorded by 400 people from different regions of China with different accents. After being transcribed and annotated by professional voice proofreaders and passing strict quality inspection, the accuracy of the text in this database is above 95%. It is divided into training set, development set, and test set.

Direct use:

https://hyper.ai/datasets/29344

2. AISHELL-2 Chinese Speech Database

Hillshell Chinese Mandarin Voice Database AISHELL-2 contains 1,000 hours of voice recordings, covering 12 fields including wake-up words, voice control words, smart home, driverless driving, and industrial production.

Direct use:

https://hyper.ai/datasets/29347

3. AISHELL-3 High-fidelity Chinese speech database

This dataset was recorded by 218 people from different accent regions in China. Professional voice proofreaders annotated the pinyin and rhythm, and passed strict quality inspection. The accuracy of the phonetic characters in this database is above 98%.

Direct use:

https://hyper.ai/datasets/29352

4. AISHELL-4 Multi-channel Chinese Conference Speech Database

AISHELL-4 consists of 211 recorded conference sessions, each with 4 to 8 speakers, with a total duration of 120 hours, which can be used for individual tasks such as speech front-end processing and speech recognition.

Direct use:

https://hyper.ai/datasets/29375

5. AISHELL-WakeUp-1 Chinese and English wake-up word voice database

This dataset invited 254 speakers to participate in the recording, with a total of nearly 4 million wake-up word voices and 1,561.12 hours. The recording text is the wake-up word "Hello, Mia" and "hi, mia". This database has been transcribed and annotated by professional voice proofreaders and has passed strict quality inspections. It can be used for research such as voiceprint recognition and voice wake-up recognition.

Direct use:

https://hyper.ai/datasets/29186

6. AISHELL-DMASH Chinese Mandarin Microphone Array Home Scene Speech Database

The AISHELL-DMASH dataset was recorded in real smart home scenarios in two different rooms. The dataset contains 30,000 hours of speech data. The dataset was transcribed by professional speech annotators with a word accuracy of 98%, and can be used for research such as voiceprint recognition, speech recognition, and wake-up word recognition.

Direct use:

https://hyper.ai/datasets/29380

7.DeepSymNet Deep Symbol Network Dataset

This is a new symbolic network called DeepSymNet proposed by researchers from the Institute of Semiconductors, Chinese Academy of Sciences to represent symbolic expressions and to be used for symbolic regression.

Direct use:

https://hyper.ai/datasets/29321

8. Evol Instruct Chinese GPT4 text dataset

This dataset was created in the following way:

(1) Translate the English questions of Evol-instruct-70k into Chinese;

(2) Request GPT4 to generate Chinese answers.

Direct use:

https://hyper.ai/datasets/29318

ScienceAI Selected Case Studies

1. Aiming at the world's most common cancer, Chinese scholars established the breast cancer prognostic scoring system MIRS

Recently, Chinese scholars used a neural network model to establish a scoring system MIRS for predicting breast cancer prognosis and treatment, which can be used to guide the formulation of treatment strategies for breast cancer patients.iScience"Journal.

View the full report:

https://hyper.ai/news/29304

2. Shenzhen Institute of Advanced Technology of Chinese Academy of Sciences proposed SBeA, which analyzes animal social behavior based on a few-shot learning framework

Animal behavior research urgently needs to improve research efficiency and accuracy through technological innovation. Based on this, SBeA (Social Behavior Atlas) came into being. Developed by the Shenzhen Institute of Advanced Technology of the Chinese Academy of Sciences, it can comprehensively quantify the behavior of free-living animals and use a small number of labeled frames (about 400 frames) to perform multi-animal 3DPose EstimationThrough the two-way transfer learning strategy, the accuracy of multi-animal identity recognition exceeds 90%. The relevant results have been published in the journal "Nature".

View the full report:

https://hyper.ai/news/29353

Popular Encyclopedia Articles

1. Floating-point operations per second FLOPS

2. Random Walk

3. Virtual Screening

4. Music Information Retrieval (MIR)

5. Quantum Neural Network

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://hyper.ai/wiki

The above is all the content of this week’s editor’s selection. If you have resources that you would like to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!

As the Chinese New Year approaches, HyperAI once again wishes everyone good luck, prosperity, and a happy and fulfilling life!In the new year, we will bring you more surprises!

See you in the Year of the Dragon!

About HyperAI

HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:

* Provide domestic accelerated download nodes for 1200+ public data sets

* Includes 300+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai/

HyperAI

Weekly Editor's Picks | Hillshell Voice Dataset Series Launched, Chinese Scholars Develop Breast Cancer Prognosis Scoring System MIRS

2 years ago

Information

AI for Science

Artificial Intelligence

Dataset

🏮During the Spring Festival, we travel thousands of miles to reunite with our families

🏮Fireworks on earth, strong New Year atmosphere, family fun

From February 5 to February 8, hyper.ai official website updates:

* High-quality public datasets: 8

* AI4S paper cases: 2

* Popular encyclopedia entries: 8

Visit the official website:hyper.ai

Selected public datasets

1. AISHELL-1 Open Source Chinese Speech Database

Direct use:

https://hyper.ai/datasets/29344

2. AISHELL-2 Chinese Speech Database

Direct use:

https://hyper.ai/datasets/29347

3. AISHELL-3 High-fidelity Chinese speech database

Direct use:

https://hyper.ai/datasets/29352

4. AISHELL-4 Multi-channel Chinese Conference Speech Database

Direct use:

https://hyper.ai/datasets/29375

5. AISHELL-WakeUp-1 Chinese and English wake-up word voice database

Direct use:

https://hyper.ai/datasets/29186

6. AISHELL-DMASH Chinese Mandarin Microphone Array Home Scene Speech Database

Direct use:

https://hyper.ai/datasets/29380

7.DeepSymNet Deep Symbol Network Dataset

Direct use:

https://hyper.ai/datasets/29321

8. Evol Instruct Chinese GPT4 text dataset

This dataset was created in the following way:

(1) Translate the English questions of Evol-instruct-70k into Chinese;

(2) Request GPT4 to generate Chinese answers.

Direct use:

https://hyper.ai/datasets/29318

ScienceAI Selected Case Studies

1. Aiming at the world's most common cancer, Chinese scholars established the breast cancer prognostic scoring system MIRS

View the full report:

https://hyper.ai/news/29304

2. Shenzhen Institute of Advanced Technology of Chinese Academy of Sciences proposed SBeA, which analyzes animal social behavior based on a few-shot learning framework

View the full report:

https://hyper.ai/news/29353

Popular Encyclopedia Articles

1. Floating-point operations per second FLOPS

2. Random Walk

3. Virtual Screening

4. Music Information Retrieval (MIR)

5. Quantum Neural Network

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://hyper.ai/wiki

As the Chinese New Year approaches, HyperAI once again wishes everyone good luck, prosperity, and a happy and fulfilling life!In the new year, we will bring you more surprises!

See you in the Year of the Dragon!

About HyperAI

* Provide domestic accelerated download nodes for 1200+ public data sets

* Includes 300+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai/

Weekly Editor's Picks | Hillshell Voice Dataset Series Launched, Chinese Scholars Develop Breast Cancer Prognosis Scoring System MIRS

2 years ago

Information

AI for Science

Artificial Intelligence

Dataset

🏮During the Spring Festival, we travel thousands of miles to reunite with our families

🏮Fireworks on earth, strong New Year atmosphere, family fun

From February 5 to February 8, hyper.ai official website updates:

* High-quality public datasets: 8

* AI4S paper cases: 2

* Popular encyclopedia entries: 8

Visit the official website:hyper.ai

Selected public datasets

1. AISHELL-1 Open Source Chinese Speech Database

Direct use:

https://hyper.ai/datasets/29344

2. AISHELL-2 Chinese Speech Database

Direct use:

https://hyper.ai/datasets/29347

3. AISHELL-3 High-fidelity Chinese speech database

Direct use:

https://hyper.ai/datasets/29352

4. AISHELL-4 Multi-channel Chinese Conference Speech Database

Direct use:

https://hyper.ai/datasets/29375

5. AISHELL-WakeUp-1 Chinese and English wake-up word voice database

Direct use:

https://hyper.ai/datasets/29186

6. AISHELL-DMASH Chinese Mandarin Microphone Array Home Scene Speech Database

Direct use:

https://hyper.ai/datasets/29380

7.DeepSymNet Deep Symbol Network Dataset

Direct use:

https://hyper.ai/datasets/29321

8. Evol Instruct Chinese GPT4 text dataset

This dataset was created in the following way:

(1) Translate the English questions of Evol-instruct-70k into Chinese;

(2) Request GPT4 to generate Chinese answers.

Direct use:

https://hyper.ai/datasets/29318

ScienceAI Selected Case Studies

1. Aiming at the world's most common cancer, Chinese scholars established the breast cancer prognostic scoring system MIRS

View the full report:

https://hyper.ai/news/29304

2. Shenzhen Institute of Advanced Technology of Chinese Academy of Sciences proposed SBeA, which analyzes animal social behavior based on a few-shot learning framework

View the full report:

https://hyper.ai/news/29353

Popular Encyclopedia Articles

1. Floating-point operations per second FLOPS

2. Random Walk

3. Virtual Screening

4. Music Information Retrieval (MIR)

5. Quantum Neural Network

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://hyper.ai/wiki

As the Chinese New Year approaches, HyperAI once again wishes everyone good luck, prosperity, and a happy and fulfilling life!In the new year, we will bring you more surprises!

See you in the Year of the Dragon!

About HyperAI

* Provide domestic accelerated download nodes for 1200+ public data sets

* Includes 300+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai/

Command Palette

Weekly Editor's Picks | Hillshell Voice Dataset Series Launched, Chinese Scholars Develop Breast Cancer Prognosis Scoring System MIRS

Command Palette

Weekly Editor's Picks | Hillshell Voice Dataset Series Launched, Chinese Scholars Develop Breast Cancer Prognosis Scoring System MIRS

Related News

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

ICML 26 Outstanding Papers: Tsinghua JustGRPO Overcomes the dLLM Inference Bottleneck; Say Goodbye to Simple Instruction Tests: Agents Last Exam Comprehensively Evaluates the long-range Professional Capabilities of Intelligent agents.

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

MiniCPM5-1B, Trained Using RL+OPD, Achieves state-of-the-art (SOTA) Performance on Multiple Complex Tasks; the CHI-Bench Dataset for Evaluating Medical Agents, Designed for Automation of Complex Healthcare Processes, Has Been released.

Command Palette

Weekly Editor's Picks | Hillshell Voice Dataset Series Launched, Chinese Scholars Develop Breast Cancer Prognosis Scoring System MIRS

Related News

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

ICML 26 Outstanding Papers: Tsinghua JustGRPO Overcomes the dLLM Inference Bottleneck; Say Goodbye to Simple Instruction Tests: Agents Last Exam Comprehensively Evaluates the long-range Professional Capabilities of Intelligent agents.

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

MiniCPM5-1B, Trained Using RL+OPD, Achieves state-of-the-art (SOTA) Performance on Multiple Complex Tasks; the CHI-Bench Dataset for Evaluating Medical Agents, Designed for Automation of Complex Healthcare Processes, Has Been released.

Related News

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

ICML 26 Outstanding Papers: Tsinghua JustGRPO Overcomes the dLLM Inference Bottleneck; Say Goodbye to Simple Instruction Tests: Agents Last Exam Comprehensively Evaluates the long-range Professional Capabilities of Intelligent agents.

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

MiniCPM5-1B, Trained Using RL+OPD, Achieves state-of-the-art (SOTA) Performance on Multiple Complex Tasks; the CHI-Bench Dataset for Evaluating Medical Agents, Designed for Automation of Complex Healthcare Processes, Has Been released.

Related News

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

ICML 26 Outstanding Papers: Tsinghua JustGRPO Overcomes the dLLM Inference Bottleneck; Say Goodbye to Simple Instruction Tests: Agents Last Exam Comprehensively Evaluates the long-range Professional Capabilities of Intelligent agents.

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

MiniCPM5-1B, Trained Using RL+OPD, Achieves state-of-the-art (SOTA) Performance on Multiple Complex Tasks; the CHI-Bench Dataset for Evaluating Medical Agents, Designed for Automation of Complex Healthcare Processes, Has Been released.