Interactive Evaluation Of Dialog On Dstc9
Metriken
Coherent
Consistent
Diversity
Error Recovery
Flexible
Informative
Inquisitive
Likeable
Overall Human Rating
Topic Depth
Understanding
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Vergleichstabelle
Modellname | Coherent | Consistent | Diversity | Error Recovery | Flexible | Informative | Inquisitive | Likeable | Overall Human Rating | Topic Depth | Understanding |
---|---|---|---|---|---|---|---|---|---|---|---|
a-unified-pre-training-framework-for | 2.8017 | 0.9390 | 2.7441 | 2.7518 | 2.8000 | 2.7881 | 2.7949 | 2.7878 | 4.15 | 2.7678 | 2.8285 |