Bias Detection On Stereoset 1
평가 지표
ICAT Score
LMS
SS
평가 결과
이 벤치마크에서 각 모델의 성능 결과
모델 이름 | ICAT Score | LMS | SS | Paper Title | Repository |
---|---|---|---|---|---|
OPT 175B | 60 | 74.8 | 59.9 | Galactica: A Large Language Model for Science | |
GPT-2 (medium) | 71.73 | - | - | StereoSet: Measuring stereotypical bias in pretrained language models | |
BERT (large) | 69.89 | - | - | StereoSet: Measuring stereotypical bias in pretrained language models | |
GPT-2 (large) | 70.54 | - | - | StereoSet: Measuring stereotypical bias in pretrained language models | |
GAL 120B | 65.6 | 75 | 56.2 | Galactica: A Large Language Model for Science | |
XLNet (large) | 72.03 | - | - | StereoSet: Measuring stereotypical bias in pretrained language models | |
XLNet (base) | 62.10 | - | - | StereoSet: Measuring stereotypical bias in pretrained language models | |
BERT (base) | 71.21 | - | - | StereoSet: Measuring stereotypical bias in pretrained language models | |
RoBERTa (base) | 67.50 | - | - | StereoSet: Measuring stereotypical bias in pretrained language models | |
GPT-3 (text-davinci-002) | 60.8 | 77.6 | 60.8 | Galactica: A Large Language Model for Science | |
GPT-2 (small) | 72.97 | - | - | StereoSet: Measuring stereotypical bias in pretrained language models |
0 of 11 row(s) selected.