HyperAI

Question Answering On Pubmedqa

Métriques

Accuracy

Résultats

Résultats de performance de divers modèles sur ce benchmark

Tableau comparatif
Nom du modèleAccuracy
mediswift-efficient-sparse-pre-trained76.8
large-language-models-encode-clinical34
biogpt-generative-pre-trained-transformer-for78.2
large-language-models-encode-clinical57.8
domain-specific-language-model-pretraining55.84
the-claude-3-model-family-opus-sonnet-haiku75.8
evaluation-of-large-language-model76.80
galactica-a-large-language-model-for-science-177.6
pubmedqa-a-dataset-for-biomedical-research78.0
bioelectra-pretrained-biomedical-text-encoder64.2
linkbert-pretraining-language-models-with70.2
galactica-a-large-language-model-for-science-173.6
large-language-models-encode-clinical79
towards-expert-level-medical-question74.0
towards-expert-level-medical-question75.0
large-language-models-encode-clinical77.2
biomedgpt-open-multimodal-generative-pre76.1
large-language-models-encode-clinical55
towards-expert-level-medical-question79.2
linkbert-pretraining-language-models-with72.2
galactica-a-large-language-model-for-science-170.2
meditron-70b-scaling-medical-pretraining-for81.6
biogpt-generative-pre-trained-transformer-for81.0
the-claude-3-model-family-opus-sonnet-haiku74.9
can-large-language-models-reason-about78.2
the-cot-collection-improving-zero-shot-and73.42
large-language-models-encode-clinical75.2
large-language-models-encode-clinical67.6
rankrag-unifying-context-ranking-with79.8