HyperAI

Question Answering On Peerqa

Metriken

AlignScore
Prometheus-2 Answer Correctness
Rouge-L

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Vergleichstabelle
ModellnameAlignScorePrometheus-2 Answer CorrectnessRouge-L
language-models-are-few-shot-learners0.13783.04080.2414
mistral-7b0.08273.42450.1922
Modell 30.13623.05710.2294
gpt-4-technical-report-10.12243.46120.2266
the-llama-3-herd-of-models0.10983.11020.2295
the-llama-3-herd-of-models0.10163.16730.2286