Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation

Jul 12, 2020

Aditya Mogadala, Marius Mosbach, Dietrich Klakow

Figure 1 for Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation

Figure 2 for Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation

Figure 3 for Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation

Figure 4 for Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation

Share this with someone who'll enjoy it:

Abstract:Generating longer textual sequences when conditioned on the visual information is an interesting problem to explore. The challenge here proliferate over the standard vision conditioned sentence-level generation (e.g., image or video captioning) as it requires to produce a brief and coherent story describing the visual content. In this paper, we mask this Vision-to-Sequence as Graph-to-Sequence learning problem and approach it with the Transformer architecture. To be specific, we introduce Sparse Graph-to-Sequence Transformer (SGST) for encoding the graph and decoding a sequence. The encoder aims to directly encode graph-level semantics, while the decoder is used to generate longer sequences. Experiments conducted with the benchmark image paragraph dataset show that our proposed achieve 13.3% improvement on the CIDEr evaluation measure when comparing to the previous state-of-the-art approach.

* International Conference on Machine Learning (ICML) 2020 Workshop (https://logicalreasoninggnn.github.io/)

View paper on

Share this with someone who'll enjoy it:

Title:Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation

Paper and Code