HyperAI

Task 1 Grouping On Ocw

المقاييس

Wasserstein Distance (WD)
# Correct Groups
# Solved Walls
Adjusted Mutual Information (AMI)
Adjusted Rand Index (ARI)
Fowlkes Mallows Score (FMS)

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
Wasserstein Distance (WD)
# Correct Groups
# Solved Walls
Adjusted Mutual Information (AMI)
Adjusted Rand Index (ARI)
Fowlkes Mallows Score (FMS)
Paper TitleRepository
BERT (BASE)89.5 ± .422 ± 20 ± 08.1 ± .46.4 ± .325.1 ± .2Pre-Training of Deep Bidirectional Protein Sequence Representations with Structural Information
GPT-3.5-turbo (0-shot)82.5114021.6 18.4 34.0GPT-4 Technical Report
GPT-3.5-turbo (1-shot)82.31230 21.2 18.2 34.4GPT-4 Technical Report
GPT-4 (1-shot)73.4262 4 33.529.743.7GPT-4 Technical Report
GPT-3.5-turbo (10-shot)81.21372 24.020.436.1GPT-4 Technical Report
E5 (LARGE)84.4 ± .776 ± 50 ± 018.5 ± .615.4 ± .532.3 ± .4Text Embeddings by Weakly-Supervised Contrastive Pre-training
FastText (News)85.5 ± .562 ± 30 ± 0 15.8 ± .313.0 ± .230.4 ± .2Learning Word Vectors for 157 Languages
FastText (Crawl)84.2 ± .580 ± 40 ± 018.4 ± .415.2 ± .332.1 ± .3Learning Word Vectors for 157 Languages
E5 (BASE)83.8 ± .689 ± 6 1 ± 019.5 ± .4 16.3 ± .433.1 ± .3Text Embeddings by Weakly-Supervised Contrastive Pre-training
BERT (LARGE)88.3 ± .533 ± 20 ± 010.3 ± .38.2 ± .3 26.5 ± .2Pre-Training of Deep Bidirectional Protein Sequence Representations with Structural Information
GPT-4 (5-shot)72.9269732.8 29.143.4GPT-4 Technical Report
Human Performance-1405285---Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset
GPT-3.5-turbo (5-shot)80.61492 25.4 22.0 37.3GPT-4 Technical Report
GloVe84.9 ± .468 ± 40 ± 017.6 ± .414.4 ± .3 31.5 ± .3--
GPT-4 (0-shot)75.8239630.727.241.5GPT-4 Technical Report
ELMo (LARGE)-55 ± 40 ± 014.5 ± .411.8 ± .429.5 ± .3Deep contextualized word representations
DistilBERT (BASE)-49 ± 40 ± 0 14.0 ± .311.3 ± .329.1 ± .2DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
GPT-3.5-turbo (3-shot)80.9140024.721.336.8GPT-4 Technical Report
GPT-4 (100-shot)73.62493 32.3 28.5 42.8GPT-4 Technical Report
RoBERTa (LARGE)-29 ± 30 ± 09.4 ± .4 8.4 ± .3 26.7 ± .2RoBERTa: A Robustly Optimized BERT Pretraining Approach
0 of 22 row(s) selected.