Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multi-modal gated recurrent units for image description

Apr 20, 2019

Xuelong Li, Aihong Yuan, Xiaoqiang Lu

Figure 1 for Multi-modal gated recurrent units for image description

Figure 2 for Multi-modal gated recurrent units for image description

Figure 3 for Multi-modal gated recurrent units for image description

Figure 4 for Multi-modal gated recurrent units for image description

Share this with someone who'll enjoy it:

Abstract:Using a natural language sentence to describe the content of an image is a challenging but very important task. It is challenging because a description must not only capture objects contained in the image and the relationships among them, but also be relevant and grammatically correct. In this paper a multi-modal embedding model based on gated recurrent units (GRU) which can generate variable-length description for a given image. In the training step, we apply the convolutional neural network (CNN) to extract the image feature. Then the feature is imported into the multi-modal GRU as well as the corresponding sentence representations. The multi-modal GRU learns the inter-modal relations between image and sentence. And in the testing step, when an image is imported to our multi-modal GRU model, a sentence which describes the image content is generated. The experimental results demonstrate that our multi-modal GRU model obtains the state-of-the-art performance on Flickr8K, Flickr30K and MS COCO datasets.

* Multi-modal gated recurrent units for image description. Multimedia Tools Appl. 77(22): 29847-29869 (2018) * 25 pages, 7 figures, 6 tables, magazine

View paper on

Share this with someone who'll enjoy it:

Title:Multi-modal gated recurrent units for image description

Paper and Code