Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Transformer-based Cross-Modal Recipe Embeddings with Large Batch Training

May 10, 2022

Jing Yang, Junwen Chen, Keiji Yanai

Figure 1 for Transformer-based Cross-Modal Recipe Embeddings with Large Batch Training

Figure 2 for Transformer-based Cross-Modal Recipe Embeddings with Large Batch Training

Figure 3 for Transformer-based Cross-Modal Recipe Embeddings with Large Batch Training

Figure 4 for Transformer-based Cross-Modal Recipe Embeddings with Large Batch Training

Share this with someone who'll enjoy it:

Abstract:In this paper, we present a cross-modal recipe retrieval framework, Transformer-based Network for Large Batch Training (TNLBT), which is inspired by ACME~(Adversarial Cross-Modal Embedding) and H-T~(Hierarchical Transformer). TNLBT aims to accomplish retrieval tasks while generating images from recipe embeddings. We apply the Hierarchical Transformer-based recipe text encoder, the Vision Transformer~(ViT)-based recipe image encoder, and an adversarial network architecture to enable better cross-modal embedding learning for recipe texts and images. In addition, we use self-supervised learning to exploit the rich information in the recipe texts having no corresponding images. Since contrastive learning could benefit from a larger batch size according to the recent literature on self-supervised learning, we adopt a large batch size during training and have validated its effectiveness. In the experiments, the proposed framework significantly outperformed the current state-of-the-art frameworks in both cross-modal recipe retrieval and image generation tasks on the benchmark Recipe1M. This is the first work which confirmed the effectiveness of large batch training on cross-modal recipe embeddings.

* 13 pages, 8 figures

View paper on

Share this with someone who'll enjoy it:

Title:Transformer-based Cross-Modal Recipe Embeddings with Large Batch Training

Paper and Code