Interactive Evaluation Of Dialog On Dstc9
평가 지표
Coherent
Consistent
Diversity
Error Recovery
Flexible
Informative
Inquisitive
Likeable
Overall Human Rating
Topic Depth
Understanding
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | Coherent | Consistent | Diversity | Error Recovery | Flexible | Informative | Inquisitive | Likeable | Overall Human Rating | Topic Depth | Understanding |
---|---|---|---|---|---|---|---|---|---|---|---|
a-unified-pre-training-framework-for | 2.8017 | 0.9390 | 2.7441 | 2.7518 | 2.8000 | 2.7881 | 2.7949 | 2.7878 | 4.15 | 2.7678 | 2.8285 |