Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Open-Ended Visual Question-Answering

Oct 09, 2016

Issey Masuda, Santiago Pascual de la Puente, Xavier Giro-i-Nieto

Figure 1 for Open-Ended Visual Question-Answering

Figure 2 for Open-Ended Visual Question-Answering

Figure 3 for Open-Ended Visual Question-Answering

Figure 4 for Open-Ended Visual Question-Answering

Share this with someone who'll enjoy it:

Abstract:This thesis report studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework. As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle Question-Answering (text based). We then modify the previous model to accept an image as an input in addition to the question. For this purpose, we explore the VGG-16 and K-CNN convolutional neural networks to extract visual features from the image. These are merged with the word embedding or with a sentence embedding of the question to predict the answer. This work was successfully submitted to the Visual Question Answering Challenge 2016, where it achieved a 53,62% of accuracy in the test dataset. The developed software has followed the best programming practices and Python code style, providing a consistent baseline in Keras for different configurations.

* Bachelor thesis report graded with A with honours at ETSETB Telecom BCN school, Universitat Polit\`ecnica de Catalunya (UPC). June 2016. Source code and models are publicly available at http://imatge-upc.github.io/vqa-2016-cvprw/

View paper on

Share this with someone who'll enjoy it:

Title:Open-Ended Visual Question-Answering

Paper and Code