HyperAI

Interactive Evaluation Of Dialog On Dstc9

Metrics

Coherent
Consistent
Diversity
Error Recovery
Flexible
Informative
Inquisitive
Likeable
Overall Human Rating
Topic Depth
Understanding

Results

Performance results of various models on this benchmark

Comparison Table
Model NameCoherentConsistentDiversityError RecoveryFlexibleInformativeInquisitiveLikeableOverall Human RatingTopic DepthUnderstanding
a-unified-pre-training-framework-for2.80170.93902.7441 2.75182.80002.78812.79492.78784.152.76782.8285