Natural Language Inference On Anli Test

评估指标

A1
A2
A3

评测结果

各个模型在此基准测试上的表现结果

模型名称
A1
A2
A3
Paper TitleRepository
ChatGPT62.352.654.1A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
GPT-336.83440.2Language Models are Few-Shot Learners
PaLM 2-S (one-shot)53.148.853.2PaLM 2 Technical Report
T0-11B (explanation prompting)75.660.659.9Prompting for explanations improves Adversarial NLI. Is this true? {Yes} it is {true} because {it weakens superficial cues}-
KiC-770M36.3035.0037.60Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models-
BLOOM 176B (one-shot)33.633.835.17BloombergGPT: A Large Language Model for Finance-
OPT 66B (one-shot)33.134.234.92BloombergGPT: A Large Language Model for Finance-
PaLM 540B (Self Consistency)-64.563.4Large Language Models Can Self-Improve-
PaLM 540B (Self Improvement, Self Consistency)-66.567.9Large Language Models Can Self-Improve-
RoE-3B35.4934.6431.22Exploring the Benefits of Training Expert Language Models over Instruction Tuning
GPT-NeoX (one-shot)32.633.836.17BloombergGPT: A Large Language Model for Finance-
T5-3B (explanation prompting)81.872.574.8Prompting for explanations improves Adversarial NLI. Is this true? {Yes} it is {true} because {it weakens superficial cues}-
InfoBERT (RoBERTa)7550.547.7InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective
PaLM 2-L (one-shot)73.163.467.1PaLM 2 Technical Report
T0-3B (CoT fine-tuned)41.737.241.9The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning-
PaLM 540B (CoT Prompting)-58.960.6Large Language Models Can Self-Improve-
RoBERTa (Large)72.449.844.4RoBERTa: A Robustly Optimized BERT Pretraining Approach
PaLM 540B (Self Improvement, Standard-Prompting)-64.866.9Large Language Models Can Self-Improve-
Bloomberg GPT (one-shot)32.934.437.33BloombergGPT: A Large Language Model for Finance-
XLNet (Large)70.350.949.4XLNet: Generalized Autoregressive Pretraining for Language Understanding
0 of 25 row(s) selected.
Natural Language Inference On Anli Test | SOTA | HyperAI超神经