Question Answering On Narrativeqa
평가 지표
BLEU-1
BLEU-4
METEOR
Rouge-L
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | BLEU-1 | BLEU-4 | METEOR | Rouge-L |
---|---|---|---|---|
multi-style-generative-reading-comprehension | 48.7 | 20.98 | 21.95 | 54.74 |
commonsense-for-generative-multi-hop-question | 43.63 | 21.07 | 19.03 | 44.16 |
densely-connected-attention-propagation-for | 44.35 | 27.61 | 21.80 | 44.69 |
a-discrete-hard-em-approach-for-weakly | - | - | - | 58.8 |
distilling-knowledge-from-reader-to-retriever-1 | 35.3 | 7.5 | 11.1 | 32 |
cut-to-the-chase-a-context-zoom-in-network | 42.76 | 22.49 | 19.24 | 46.67 |
multi-style-generative-reading-comprehension | 54.11 | 30.43 | 26.13 | 59.87 |
the-narrativeqa-reading-comprehension | 54.60/55.55 | 26.71/27.78 | - | - |
multi-granular-sequence-encoding-via-dilated | 36.55 | 19.79 | 17.87 | 41.44 |
bidirectional-attention-flow-for-machine | 33.45 | 15.69 | 15.68 | 36.74 |