Weekly Editor's Picks | RJUA-QA Medical Dataset Launched, 3D Molecular Generation Model ResGen Paper Analysis

HyperAI's new column is here~Every Monday, the HyperNeural editorial department will select the content (data sets, AI4S paper cases, encyclopedia entries) updated on the hyper.ai official website in the previous week and publish them here. Welcome to visit hyper.ai directly to view all the content!
From January 15th to January 21st, hyper.ai official website updated quickly:
* High-quality public datasets: 10
* AI4S paper cases: 2
* Popular encyclopedia entries: 10
Visit the official website:https://hyper.ai/
Selected public datasets
1. CrossDock2020:ResGen Datasets processed for research
The initial data of this dataset contains more than 22 million protein-ligand pairs. This dataset can be used for protein-small molecule interaction research, especially for evaluating the binding ability of molecules to protein pockets.
Direct use:
https://hyper.ai/datasets/29021
2. RJUA-QA: The first Chinese medical specialty question answering reasoning dataset
RJUA-QA is an innovative question-answering reasoning dataset for medical urology. The dataset was created by the Ant Group Medical LLM team in collaboration with the urology expert team of Renji Hospital affiliated to Shanghai Jiao Tong University School of Medicine. The dataset was developed to convert real clinical patient data into virtual patient clinical dialogues, presented in the Q-context-A (question-context-answer) format.
Direct use:
https://hyper.ai/datasets/28970
3. MetaMathQA Mathematical Reasoning Dataset
In order to improve the forward and reverse reasoning capabilities of the model, researchers from Cambridge, HKUST, and Huawei proposed the MetaMathQA dataset based on two commonly used mathematical datasets (GSM8K and MATH): a mathematical reasoning dataset with wide coverage and high quality. MetaMathQA consists of 395K forward and reverse mathematical question-answer pairs generated by a large language model.
Direct use:
https://hyper.ai/datasets/28954
4. M³IT Multi-mode Multi-language Instruction Tuning Dataset
The dataset consists of 40 datasets with 2.4 million instances and 400 manually written task instructions, reformatted into a visual-to-text structure. The dataset compiles various tasks of classic visual-language tasks, including captioning, visual question answering (VQA), visual conditional generation, reasoning, and classification.
Direct use:
https://hyper.ai/datasets/29048
5. ChatHaruhi-RolePlaying role-playing dialogue dataset
ChatHaruhi is a dataset containing 32 Chinese/English TV/anime characters and more than 54k simulated dialogues. Role-playing chatbots built with large language models have attracted widespread attention. In order to imitate specific fictional characters, the research team proposed an algorithm to control the language model through improved prompts and memory of characters extracted from scripts. By collecting corpora of movies, novels, and scripts and performing structured extraction, the research team collected more than 23,000 dialogue messages.
Direct use:
https://hyper.ai/datasets/28926
For more updated datasets this week, please visit:
ScienceAI ArgumentSelected Case Studies
Zhejiang University andZhijiang LaboratoryThe research team proposed a 3D molecule generation model based on protein pockets, ResGen, which is 8 times faster than the previous best technology and successfully generated drug-like molecules with lower binding energy and higher diversity. The paper has been published in the journal "Nature".
View the full report:
Luo Xiaozhou's team from the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, proposed a framework for predicting enzyme kinetic parameters (UniKP) to achieve the prediction of a variety of different enzyme kinetic parameters. The paper has been published in the journal Nature.
View the full report:
Popular Encyclopedia Articles
1. Sigmoid function
2. Markov chain (Markov Chain)
3. Cue word attack (Prompt Injection)
4. Reward Model
5. Prompt Engineering
Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:
The above is all the content of this week’s editor’s selection. If you have resources that you would like to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!
See you next week!
About HyperAI
HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:
* Provide domestic accelerated download nodes for 1200+ public data sets
* Includes 300+ classic and popular online tutorials
* Interpretation of 100+ AI4Science paper cases
* Support 500+ related terms search
* Hosting the first complete Apache TVM Chinese documentation in China
Visit the official website to start your learning journey: