ApolloCorpora 多语言医疗数据集
ApolloCorpora 是深圳市大数据研究院和香港中文大学的研究团队联合构建的一个多语言医疗数据集。该数据集涵盖了全球 61 亿人口使用的六种主要语言,包括英语、中文、印地语、西班牙语、法语和阿拉伯语。
数据收集涉及书籍、临床指南、百科全书、论文、论坛和考试等。数据处理方面,研究者将原始预训练语料转换为问答对,以增强模型的医疗能力。 ApolloCorpora 还注重本地化特征,如症状诊断、药物名称、沟通术语及医疗实践标准,以适应不同文化和医疗体系。该数据集为多语言医疗 AI 模型的开发与评估提供了坚实基础,有助于推动医疗 AI 技术的全球应用。
ApolloCorpus.torrent
做种 1正在下载 2已完成 130总下载次数 105
- ApolloCorpus/
- README.md1.51 KB
- README.txt3.01 KB
- data/
- ApolloCorpus.zip20.17 GB
- ApolloCorpus/
- .git/
- HEAD20.17 GB
- config20.17 GB
- description20.17 GB
- hooks/
- applypatch-msg.sample20.17 GB
- commit-msg.sample20.17 GB
- fsmonitor-watchman.sample20.17 GB
- post-checkout20.17 GB
- post-commit20.17 GB
- post-merge20.17 GB
- post-update.sample20.17 GB
- pre-applypatch.sample20.17 GB
- pre-commit.sample20.17 GB
- pre-merge-commit.sample20.17 GB
- pre-push20.17 GB
- pre-push.sample20.17 GB
- pre-rebase.sample20.17 GB
- pre-receive.sample20.17 GB
- prepare-commit-msg.sample20.17 GB
- push-to-checkout.sample20.17 GB
- update.sample20.17 GB
- index20.17 GB
- info/
- exclude20.17 GB
- lfs/
- objects/
- 11/
- 92/
- 1192fe7bd289f093b8f396fbb170f840251c8c131e717006719a578a45b9ac2121.11 GB
- 12/
- 1f/
- 121f8b1c5a4b43bdb333f24f3a7de5d74747911979bbd7fd949319191f828f0a21.12 GB
- 28/
- be/
- 28be668d69ca50c9663f3c76de98ebcc3a58193fca7bd064729956f054619fcc22.28 GB
- 29/
- 0a/
- 290a32cd7cb11008f75e1c900561ee06d12b3de0b3eb5387c7eeffad9437451222.41 GB
- 31/
- d6/
- 31d686969d1ff43a133ef78d9e5ce9748a3d29bb4d13da5ad072b11f02642cc322.72 GB
- 32/
- 28/
- 3228e7f07ff481d4c8152f2637cdcb4e3267499db1411ca35c927061656f431a22.76 GB
- 36/
- 06/
- 3606f698c026dded753f49d0e18d200ded4118d54eab76d806dde65f5c45dd5f22.83 GB
- 40/
- a4/
- 40a463c8f0d0b61904d383973d83cefd11f70485675df0bda24708f83aed4fa222.94 GB
- 50/
- 9d/
- 509de1d3f83919d962395c379ac2ee90021f5dc66c5090840b3f3e57db02059f24.38 GB
- 58/
- 55/
- 5855b20e017ef701a43cb6f66601c0f21bd8c2f3c83bae09441a0e9a9054b8fc26.43 GB
- 66/
- ef/
- 66ef20d6b556de8288628e9d231ab14fbe9334f7c5e0b05a8c1ee2f126fc78db26.75 GB
- 70/
- 82/
- 70823abe1a64276779080b1d315cfd01a39f5038762aee96d51ea8ecb9cbdfd526.77 GB
- 88/
- 76/
- 887608dbfc32fd2d82a076701c56bbca87e85d42f09c02752f4f391768fbc9c027.53 GB
- 91/
- 8b/
- 918b2ef2c528f4b00f54cc18eb99ecc56d895a61de7521ef70c7a256f810891428.03 GB
- 96/
- de/
- 96de5e54ca11e82987680a802cc9191659f1f7843f310c6827f34a656a7f570c29.08 GB
- 00/
- 1c/
- 001c6a7373cc466033637f7824b0b0edbb92ea04684fb538a4a7202c99d9454a20.22 GB
- 03/
- bb/
- 03bb47a85d7c1c7c0b63e1b88111b941114eaa5979ea46bded580f7a2744fc1821.07 GB
- 09/
- 40/
- 09405bf3a1ece73190f65935a8e64a413de2bf6260208a49a94fd6acc3c3b8d121.09 GB
- 0b/
- 8d/
- 0b8def5335be2d2ba254539e9455138d4f5e6767a79349820ec92d42bb2b055b21.09 GB
- 1b/
- a8/
- 1ba8c980524f8d6abceefeba751ba3fb9c1468c040f6950d669f47ef568842c022.03 GB
- 1d/
- 6d/
- 1d6d9a7a48717bc1e1f13e0e2430e614d0da40118814bc9c6f4568aa79ac549c22.11 GB
- 2a/
- 75/
- 2a755f50baded82c74f9fce895bcf8f6f8ce9eba53b451a4120fddce872eb0e522.54 GB
- 2c/
- 6f/
- 2c6feca5547754b2381b29fc06ecc400a60fa007ca37750c9392d6ecf1fbdfb422.54 GB
- 4b/
- 24/
- 4b24421da10c797b2f20ccc9eb8f1ce1f9eedb1d65e44f8941f62ae011392c1123.93 GB
- 5a/
- 79/
- 5a79cd825fb49654ecc267f790a799cc809c6dd595b3399774d8bb00c8fbb38626.43 GB
- 5d/
- 2b/
- 5d2b4016abe17ef24ef27ead7c8765a416ff15bd4bbdea9d00c3d543c80712eb26.64 GB
- 8e/
- 59/
- 8e597e0eb9f09e71a5b0332d570c981acec5bacf829c32c96611e31dccbf42f227.82 GB
- a0/
- c2/
- a0c222b7a4ee76ddf568935fe9c3f6eaec091d5a01add17f9bfca36c39b2bd2f29.24 GB
- ab/
- aa/
- abaae528c035b713a58aac475c24cab47298fb22328077ab5c5817f9838265cf29.39 GB
- b8/
- 3e/
- b83ea8904a78c8464de6cc68b9469579720c6bdd9f920e11dc6f2dad55d3d15529.52 GB
- bd/
- a9/
- bda92d7820be87ef3b5a7f9049e8665c4dde7284e6516fd0032d0b8f7256c34a29.53 GB
- cc/
- 43/
- cc438db53d75c1fe1765021c6a6aa769535b54f3b37b43045043025f9068437730.9 GB
- 99/
- cc991c8a17630e7c6650e76426445655f53805f7e4c7a31074a66483406fddd530.95 GB
- cd/
- 70/
- cd70ca408a32c4005f811566b716ce09a9b5a0ded7a0299c992d7d99de52e71e31.15 GB
- b7/
- cdb7b2915c5f32f5496411e8c806aa98dbb82a389bfbfc738738f5d5e5c3893131.16 GB
- d2/
- 77/
- d2771d871aab637359e9f764ab647fe9b56e59477d3632d0080b63a2bf823c3a31.59 GB
- e2/
- d2e2b93393a1dbbcbf63a1ea64fd6fb97c1b2f8230d2d60b755e352d9dfb3f8a32.25 GB
- dc/
- ae/
- dcae9fbb4429e9772b8159c9c70d9c14d580f298124ddb762651f70add6dfce737.39 GB
- e0/
- dce0f91395ceb8aea2413e4af9bf166e5e93b8d7b028daca45cc4391560333b137.48 GB
- e5/
- ae/
- e5aed9b405031f350c979620b65b86a6655df86b2c078ac399f638a38445600e37.63 GB
- f3/
- 64/
- f364c1f06b0c8b804c7d03e956817413c2eb52866f7bec123e46abc06a9fb17037.88 GB
- f8/
- 3b/
- f83bb54737b6c5258fefc43f66d0fa6678d7d70126467ddc8eb945768615c4ca38.82 GB
- f9/
- 85/
- f985f11bd88135fc9ff5bc8e02d1268627287ad6bd5df354488c341bb278258f38.97 GB
- fb/
- 2b/
- fb2b4a91ac8d40976e0cfc36c43d5d45c8350d285c165db59a18f4c45a6c65aa39.64 GB
- fc/
- a2/
- fca27918051cb8bd48336cfd9893bbc6fa5c71f1af38c5ab745175d13da3077f40.89 GB
- logs/
- HEAD40.89 GB
- refs/
- heads/
- main40.89 GB
- remotes/
- origin/
- HEAD40.89 GB
- objects/
- pack/
- pack-a2f2ba428232b1d100aa2720f5f3468e1bd0398b.idx40.89 GB
- pack-a2f2ba428232b1d100aa2720f5f3468e1bd0398b.pack40.9 GB
- packed-refs40.9 GB
- refs/
- heads/
- main40.9 GB
- remotes/
- origin/
- HEAD40.9 GB
- .gitattributes40.9 GB
- ApolloCorpus.zip46.04 GB
- README.md46.04 GB
- assets/
- apollo_medium_final.png46.04 GB
- dataset.png46.04 GB
- result.png46.04 GB
- check_exam/
- check_en.py46.04 GB
- check_es.py46.04 GB
- check_fr.py46.04 GB
- check_zh.py46.04 GB
- questions/
- ar_question.json46.04 GB
- en_question.json46.05 GB
- es_question.json46.05 GB
- fr_question.json46.05 GB
- hi_question.json46.05 GB
- zh_question.json46.05 GB
- train/
- pretrain/
- medicalBook_en_qa.json46.9 GB
- medicalBook_en_text.json48.15 GB
- medicalBook_zh_qa.json48.81 GB
- medicalBook_zh_text.json49.23 GB
- medicalGuideline_en_qa.json49.38 GB
- medicalGuideline_en_text.json49.52 GB
- medicalPaper_en_qa.json50.43 GB
- medicalPaper_en_text.json51.47 GB
- medicalPaper_es_qa.json51.59 GB
- medicalPaper_es_text.json51.75 GB
- medicalPaper_fr_qa.json51.76 GB
- medicalPaper_fr_text.json51.78 GB
- medicalPaper_zh_qa.json51.99 GB
- medicalPaper_zh_text.json52.14 GB
- medicalWeb_en_qa.json52.81 GB
- medicalWeb_en_text.json54.85 GB
- medicalWeb_es_qa.json55.01 GB
- medicalWeb_es_text.json55.22 GB
- medicalWeb_zh_qa.json56.2 GB
- medicalWeb_zh_text.json57.57 GB
- medicalWiki_en_qa.json57.82 GB
- medicalWiki_en_text.json58.76 GB
- medicalWiki_fr_qa.json58.77 GB
- medicalWiki_fr_text.json58.79 GB
- medicalWiki_hi_qa.json58.79 GB
- medicalWiki_hi_text.json58.79 GB
- question_dup.py58.79 GB
- sft/
- code_en.json58.83 GB
- code_zh.json58.85 GB
- general_ar.json58.93 GB
- general_en.json59.69 GB
- general_es.json59.76 GB
- general_fr.json59.84 GB
- general_hi.json59.95 GB
- general_zh.json60.25 GB
- math_en.json60.3 GB
- math_zh.json60.31 GB
- medicalExam_en.json60.49 GB
- medicalExam_en_clean.json60.67 GB
- medicalExam_en_dup.json60.67 GB
- medicalExam_en_dup_question.json60.67 GB
- medicalExam_es.json60.67 GB
- medicalExam_es_clean.json60.67 GB
- medicalExam_es_dup.json60.67 GB
- medicalExam_es_dup_question.json60.67 GB
- medicalExam_fr.json60.67 GB
- medicalExam_fr_clean.json60.67 GB
- medicalExam_fr_dup.json60.67 GB
- medicalExam_fr_dup_question.json60.67 GB
- medicalExam_zh.json60.8 GB
- medicalExam_zh_clean.json60.92 GB
- medicalExam_zh_dup.json60.92 GB
- medicalExam_zh_dup_question.json60.92 GB
- medicalPatient_ar.json60.97 GB
- medicalPatient_en.json61.43 GB
- medicalPatient_zh.json61.63 GB