MVAN | 0.6765 | 3.73 | 54.65 | 91.47 | 83.85 | Multi-View Attention Network for Visual Dialog | |
CorefNMN (ResNet-152) | 64.1 | 4.45 | 50.92 | 88.81 | 80.18 | Visual Coreference Resolution in Visual Dialog using Neural Module Networks | |
RVA | 0.6634 | 3.93 | 52.71 | 90.73 | 82.97 | Recursive Visual Attention in Visual Dialog | |