Date

2 years ago

Size

61.63 GB

Organization

Tags

Natural Language Processing

Language

Medical Imaging

ApolloCorpora is a multilingual medical dataset jointly constructed by the Shenzhen Big Data Research Institute and the Chinese University of Hong Kong research team. The dataset covers six major languages used by 6.1 billion people worldwide, including English, Chinese, Hindi, Spanish, French and Arabic. Data collection involves books, clinical guidelines, encyclopedias, papers, forums, and exams. In terms of data processing, researchers convert the original pre-training corpus into question-answer pairs to enhance the medical capabilities of the model. ApolloCorpora also focuses on localized features such as symptom diagnosis, drug names, communication terms, and medical practice standards to adapt to different cultures and medical systems. This dataset provides a solid foundation for the development and evaluation of multilingual medical AI models, and helps promote the global application of medical AI technology.

ApolloCorpus.torrent

Seeding 1Downloading 0Completed 249Total Downloads 299

ApolloCorpus/
- README.md
  1.51 KB
- README.txt
  3.01 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.