Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

Jun 11, 2015

Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille

Figure 1 for Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

Figure 2 for Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

Figure 3 for Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

Figure 4 for Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

Share this with someone who'll enjoy it:

Abstract:In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In addition, we apply the m-RNN model to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval. The project page of this work is: www.stat.ucla.edu/~junhua.mao/m-RNN.html .

* ICLR 2015 * Add a simple strategy to boost the performance of image captioning task significantly. More details are shown in Section 8 of the paper. The code and related data are available at https://github.com/mjhucla/mRNN-CR ;. arXiv admin note: substantial text overlap with arXiv:1410.1090

View paper on

Share this with someone who'll enjoy it:

Title:Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

Paper and Code