HyperAI超神経
ホーム
ニュース
最新論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
システム
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
ログイン
ログイン
ホーム
SOTA
Sentence Ordering
Sentence Ordering On Econlogicqa
Sentence Ordering On Econlogicqa
評価指標
Accuracy
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
Accuracy
Paper Title
Repository
Gemma-7B-IT
0.0231
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Mistral-7B-Instruct-v0.1
0.1538
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Yi-6B-Chat
0.2077
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Mistral-7B-Instruct-v0.2
0.3154
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Llama-2-7B
0.0077
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Zephyr-7B-Alpha
0.2308
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Yi-6B
0.0385
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Llama-3-8B
0.2385
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Mistral-7B-v0.2
0.2615
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Mistral-7B-v0.1
0.2615
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Zephyr-7B-Beta
0.1769
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Llama-2-13B-Chat
0.1462
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Llama-2-7B-Chat
0.0923
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Llama-3-8B-Instruct
0.3462
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Gemma-2B-IT
0.0846
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
GPT-3.5-Turbo
0.3769
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
GPT-4
0.5538
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
GPT-4-Turbo
0.5692
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
0 of 18 row(s) selected.
Previous
Next