Interactive Evaluation Of Dialog On Dstc9
Métriques
Coherent
Consistent
Diversity
Error Recovery
Flexible
Informative
Inquisitive
Likeable
Overall Human Rating
Topic Depth
Understanding
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | Coherent | Consistent | Diversity | Error Recovery | Flexible | Informative | Inquisitive | Likeable | Overall Human Rating | Topic Depth | Understanding |
---|---|---|---|---|---|---|---|---|---|---|---|
a-unified-pre-training-framework-for | 2.8017 | 0.9390 | 2.7441 | 2.7518 | 2.8000 | 2.7881 | 2.7949 | 2.7878 | 4.15 | 2.7678 | 2.8285 |