Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

René Schallner

Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing

May 06, 2019

Philipp Harzig, Dan Zecha, Rainer Lienhart, Carolin Kaiser, René Schallner

Figure 1 for Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing

Figure 2 for Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing

Figure 3 for Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing

Figure 4 for Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing

Abstract:Automatically generating descriptive captions for images is a well-researched area in computer vision. However, existing evaluation approaches focus on measuring the similarity between two sentences disregarding fine-grained semantics of the captions. In our setting of images depicting persons interacting with branded products, the subject, predicate, object and the name of the branded product are important evaluation criteria of the generated captions. Generating image captions with these constraints is a new challenge, which we tackle in this work. By simultaneously predicting integer-valued ratings that describe attributes of the human-product interaction, we optimize a deep neural network architecture in a multi-task learning setting, which considerably improves the caption quality. Furthermore, we introduce a novel metric that allows us to assess whether the generated captions meet our requirements (i.e., subject, predicate, object, and product name) and describe a series of experiments on caption quality and how to address annotator disagreements for the image ratings with an approach called soft targets. We also show that our novel clause-focused metrics are also applicable to other image captioning datasets, such as the popular MSCOCO dataset.

* 6 pages, accepted at MIPR 2019

Via

Access Paper or Ask Questions

Multimodal Image Captioning for Marketing Analysis

Feb 06, 2018

Philipp Harzig, Stephan Brehm, Rainer Lienhart, Carolin Kaiser, René Schallner

Figure 1 for Multimodal Image Captioning for Marketing Analysis

Figure 2 for Multimodal Image Captioning for Marketing Analysis

Figure 3 for Multimodal Image Captioning for Marketing Analysis

Abstract:Automatically captioning images with natural language sentences is an important research topic. State of the art models are able to produce human-like sentences. These models typically describe the depicted scene as a whole and do not target specific objects of interest or emotional relationships between these objects in the image. However, marketing companies require to describe these important attributes of a given scene. In our case, objects of interest are consumer goods, which are usually identifiable by a product logo and are associated with certain brands. From a marketing point of view, it is desirable to also evaluate the emotional context of a trademarked product, i.e., whether it appears in a positive or a negative connotation. We address the problem of finding brands in images and deriving corresponding captions by introducing a modified image captioning network. We also add a third output modality, which simultaneously produces real-valued image ratings. Our network is trained using a classification-aware loss function in order to stimulate the generation of sentences with an emphasis on words identifying the brand of a product. We evaluate our model on a dataset of images depicting interactions between humans and branded products. The introduced network improves mean class accuracy by 24.5 percent. Thanks to adding the third output modality, it also considerably improves the quality of generated captions for images depicting branded products.

* 4 pages, 1 figure, accepted at MIPR2018

Via

Access Paper or Ask Questions