VQA Visual Question Answering Dataset
Date
Size
License
其他
This dataset comes from Campinas State University MO434 Subject Knowledge Base.
Introduction
This is a simple Flask application that generates answers based on images and natural language questions about the image. Behind the scenes, the application uses a deep learning model trained with TensorFlow.
Model Overview
The development of deep learning has promoted the solution of multimodal learning related tasks. Visual Question Answering (VQA) is a very challenging example, which requires high-level scene interpretation from images and modeling with relevant question-answering language. Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. This is an end-to-end system implemented with Keras that aims to accomplish this task.
Model architecture based on the paper Hierarchical Question-Image Co-Attention for Visual Question Answering .