HyperAI超神经

This dataset comes from Campinas State University MO434 Subject Knowledge Base.

Introduction

This is a simple Flask application that generates answers based on images and natural language questions about the image. Behind the scenes, the application uses a deep learning model trained with TensorFlow.

Model Overview

The development of deep learning has promoted the solution of multimodal learning related tasks. Visual Question Answering (VQA) is a very challenging example, which requires high-level scene interpretation from images and modeling with relevant question-answering language. Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. This is an end-to-end system implemented with Keras that aims to accomplish this task.

Model architecture based on the paper Hierarchical Question-Image Co-Attention for Visual Question Answering .

VQA Visual Question Answering Dataset

Introduction

Model Overview