Interactive Evaluation Of Dialog On Dstc9
评估指标
Coherent
Consistent
Diversity
Error Recovery
Flexible
Informative
Inquisitive
Likeable
Overall Human Rating
Topic Depth
Understanding
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Coherent | Consistent | Diversity | Error Recovery | Flexible | Informative | Inquisitive | Likeable | Overall Human Rating | Topic Depth | Understanding | Paper Title | Repository |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PLATO-2 | 2.8017 | 0.9390 | 2.7441 | 2.7518 | 2.8000 | 2.7881 | 2.7949 | 2.7878 | 4.15 | 2.7678 | 2.8285 | A Unified Pre-training Framework for Conversational AI |
0 of 1 row(s) selected.