HyperAI

Weekly Editor's Picks | MathPile Mathematical Reasoning Corpus Open Source, Union Eye Hospital Leads the Use of AI to Assist in the Detection of 13 Fundus Diseases

特色图像

Recently, Shanghai Jiao Tong University's Generative Artificial Intelligence Research Laboratory (GAIR)The high-quality and diverse pre-trained dataset MathPile, which is tailored for the field of mathematics, and its commercial version MathPile-Commercial are open source.Now you can download it on the hyper.ai official website! MathVista , Math23K and other popular mathematical data sets are waiting for you to use~

From February 19 to February 23, hyper.ai official website updates:

* High-quality public datasets: 10

* AI4S paper cases: 4

* Popular encyclopedia entries: 10

Visit the official website:hyper.ai

Selected public datasets

1MathPile Mathematical Reasoning Pre-trained Corpus

The Generative Artificial Intelligence Laboratory of Shanghai Jiao Tong University has launched the MathPile dataset, a high-quality, diverse pre-trained corpus specifically for the field of mathematics, containing approximately 9.5 billion tokens, designed to enhance the mathematical reasoning capabilities of large models.

Direct use:

https://hyper.ai/datasets/29543

2. MathPile-Commercial Mathematical Reasoning Pre-training Corpus (Commercial Version)

MathPile-Commercial is a commercial version of MathPile, obtained by removing the documents in MathPile that prohibit commercial use (the latest version, v0.2). Specifically, the research team performed non-commercial use detection on the source data, using the license information in the metadata of the arXiv source, and using keyword matching on other sources.

Direct use:

https://hyper.ai/datasets/29545

3. AI-generated image datasets

This dataset contains 19 images of boys generated by Copilot, an AI companion that creates imaginative and innovative content. These images are suitable for face and pose detection tasks because they vary in facial expressions, poses, backgrounds, lighting, and occlusions.

Direct use:

https://hyper.ai/datasets/29527

4. A diverse AI-generated portrait dataset

The dataset contains 140 high-quality images carefully crafted by advanced AI algorithms, including 70 female portraits and 70 male portraits. Each image in the dataset demonstrates the extraordinary ability of artificial intelligence in mimicking the complexity of human appearance.

Direct use:

https://hyper.ai/datasets/29529

5THUCNews  Chinese text classification dataset

THUCNews is generated by filtering and filtering the historical data of Sina News RSS subscription channel from 2005 to 2011, including 740,000 news documents (2.19 GB), all in UTF-8 plain text format. Based on the original Sina News classification system, the research team re-integrated and divided 14 candidate classification categories: finance, lottery, real estate, stocks, home, education, technology, society, fashion, current affairs, sports, constellations, games, and entertainment.

Direct use:

https://hyper.ai/datasets/29521

6. ShareGPT 90k Chinese and English bilingual human-machine question answering dataset

ShareGPT-Chinese-English-90k is a high-quality human-machine question-answering dataset in parallel Chinese and English, covering user question data in real and complex scenarios. This dataset can be used to train high-quality dialogue models.

Direct use:

https://hyper.ai/datasets/29523

7. SMP-2017 Chinese Conversation Intent Recognition Dataset

This dataset is the SMP2017 Chinese Human-Computer Dialogue Technology Evaluation (ECDT) Task 1 dataset. This evaluation aims to promote the development of research related to Chinese human-computer dialogue systems.

Direct use:

https://hyper.ai/datasets/29515

8. Toutiao text classification dataset

This dataset is a classification dataset of Toutiao Chinese news (short text). The data source is Toutiao client. It contains 15 categories and 382,688 texts. The collection time is May 2018.

Direct use:

https://hyper.ai/datasets/29517

For more updated datasets this week, please visit:

https://hyper.ai/datasets

ScienceAI Paper Case Studies

1. Led by Peking Union Medical College Eye Hospital, five ophthalmology centers work together to use AI to assist in the detection of 13 types of fundus diseases

The diagnosis of ophthalmic diseases is highly dependent on image recognition, and ophthalmology is very suitable for the application of technologies such as deep learning. In order to further explore the potential value of deep learning in the diagnosis of fundus diseases, Chen Youxin, director of the Department of Ophthalmology at Peking Union Medical College Hospital, led a deep learning system developed by 5 ophthalmology centers across the country in cooperation with Beijing Zhiyuan Huitu Technology Co., Ltd. and Professor Li Xirong of the School of Information at Renmin University of China. The system helped primary ophthalmologists improve the diagnostic consistency by about 12% and provided a new method for the automatic detection of 13 major fundus diseases. The relevant paper has been published in the journal "Nature".

View the full report:

https://hyper.ai/news/29549

2. More than 50,000 people participated in the study, and the team of Professor Wu Xifeng of Zhejiang University published a new study: Health is related to the level of greening in office spaces

The ecological environment has a subtle impact on human health. Professor Wu Xifeng's research team at the School of Public Health of Zhejiang University used a convolutional neural network model to evaluate visible green exposure based on the green view index of street view images, and then explored whether there is a beneficial association between the level of visible greenery in the workplace and metabolic syndrome in adults. The research team used a logistic regression model to evaluate the level of outdoor visible greenery in the working environment of more than 50,000 adults in Hangzhou, confirming the beneficial association between the two. The relevant results have been published in the journal "Environment International".

View the full report:

https://hyper.ai/news/29559

3. The AI4S team of Shanghai Jiao Tong University proposed the concept of "intelligent scientific facilities" to establish an interdisciplinary AI research assistant

Shanghai Jiao Tong University Institute of Artificial Intelligence AI for Science  Professor Yang Xiaokang and others from the team proposed a concept for the construction of intelligent scientific facilities, forming innovative functions such as large-scale models in scientific fields, generative simulation and inversion, autonomous intelligent unmanned experiments, and large-scale trusted scientific research collaboration. The relevant research results have been published in the "Journal of the Chinese Academy of Sciences".

View the full report:

https://hyper.ai/news/29559

4. Selected by Amazon engineers, a collection of over 40 LLM papers

More and more companies and traditional industries are beginning to explore how to apply large language models to their own businesses. The rapidly expanding market demand has also driven the further deepening and innovation of research in related fields, and the papers on platforms such as arXiv are being updated more frequently. In order to help everyone retrieve high-value papers faster, Amazon engineer Eugene Yan and others have established a language model paper reading list to continuously share cutting-edge papers. Currently, more than 40 high-quality papers have been compiled.

View the full paper summary:

https://hyper.ai/news/29582

Popular Encyclopedia Articles

1. Recall Recall Rate

2. Human Feedback Reinforcement Learning RLHF

3. Artificial General Intelligence (AGI)

4. Retrieval Enhancement Generates RAG

5. Neural Radiance Field (NeRF)

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://hyper.ai/wiki

The above is all the content of this week’s editor’s selection. If you have resources that you want to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!

See you next week!

About HyperAI

HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:

* Provide domestic accelerated download nodes for 1200+ public data sets

* Includes 300+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai/