Long Range Modeling On Scrolls
평가 지표
Avg.
CNLI
GovRep
Nrtv
QALT EM-T/H
QMSum
Qspr
SumScr
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | Avg. | CNLI | GovRep | Nrtv | QALT EM-T/H | QMSum | Qspr | SumScr |
---|---|---|---|---|---|---|---|---|
longt5-efficient-text-to-text-transformer-for | 38.6 | 85.6 | 57.7 / 30.0 / 31.4 | 23.0 | 37.9 / 36.6 | 33.9 / 11.0 / 22.8 | 46.6 | 34.8 / 9.6 / 21.1 |
adapting-pretrained-text-to-text-models-for | 39.76 | 87.1 | 59.4 / 29.8 / 30.8 | 26.2 | 37.8 / 34.0 | 35.1 / 11.0 / 22.0 | 48.7 | 37.7 / 10.2 / 21.5 |
efficient-long-text-understanding-with-short | 37.99 | 87.3 | 57.5 / 26.3 / 27.4 | 24.1 | 34.8 / 34.8 | 34.2 / 11.0 / 22.0 | 46.9 | 35.2 / 8.7 / 19.4 |
longt5-efficient-text-to-text-transformer-for | 42.53 | 88.2 | 61.1 / 32.3 / 33.7 | 29.3 | 46.0 / 42.1 | 34.9 / 11.8 / 23.5 | 53.1 | 35.8 / 9.6 / 21.1 |
longt5-efficient-text-to-text-transformer-for | 41.03 | 87.3 | 61.3/32.2/33.8 | 27.2 | 40.6 / 38.6 | 35.1 / 12.0 / 23.3 | 52.3 | 60.3 / 31.1 / 32.8 |
scrolls-standardized-comparison-over-long | 19.35 | 66 | 45.3 / 17.9 / 20.8 | 1.5 | 25.2 / 26.1 | 14.2 / 2.0 / 9.3 | 3.4 | 19.6 / 1.8 / 11.0 |
unifying-language-learning-paradigms | - | 88.7 | - | - | - | - | - | - |
scrolls-standardized-comparison-over-long | 29.01 | 77.4 | 47.9 / 18.6 / 22.7 | 15.4 | 26.0 / 25.9 | 30.2 / 8.7 / 20.7 | 26.3 | 27.2 / 4.9 / 16.7 |
investigating-efficiently-extending | - | - | 59.3 / 29.3 / 30.9 | - | - | 32.9 / 9.8 / 21.4 | - | 35.0 / 8.9 / 20.4 |
scrolls-standardized-comparison-over-long | - | - | - | - | - | - | - | - |
colt5-faster-long-range-transformers-with | 43.51 | 88.4 | 61.3/32.2/33.8 | 31.1 | 48.1/43.8 | 36.2/12.9/24.3 | 53.9 | 36.4/10.2/21.7 |
unifying-language-learning-paradigms | 37.87 | - | 53.6 / 26.1 / 28.8 | 24.2 | 45.8 / 40.7 | 31.1 / 8.5 / 20.4 | 37.6 | 32.9 / 7.8 / 19.4 |
investigating-efficiently-extending | - | - | 60.3 / 30.0 / 31.5 | - | - | 33.2 / 9.6 / 21.6 | - | 35.7 / 9.1 / 20.6 |