HyperAI초신경

Question Answering On Fever

평가 지표

EM

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름EM
measuring-and-narrowing-the-compositionality64.2
chain-of-action-faithful-and-multimodal54.2
language-models-are-unsupervised-multitask50
chain-of-action-faithful-and-multimodal64.2
chain-of-action-faithful-and-multimodal50
chain-of-action-faithful-and-multimodal68.9
dspy-compiling-declarative-language-model62.2
chain-of-action-faithful-and-multimodal62.2