Abstract:There are a wide range of applications that involve multi-modal data, such as cross-modal retrieval, visual question-answering, and image captioning. Such applications are primarily dependent on aligned distributions of the different constituent modalities. Existing approaches generate latent embeddings for each modality in a joint fashion by representing them in a common manifold. However these joint embedding spaces fail to sufficiently reduce the modality gap, which affects the performance in downstream tasks. We hypothesize that these embeddings retain the intra-class relationships but are unable to preserve the inter-class dynamics. In this paper, we present a novel framework COBRA that aims to train two modalities (image and text) in a joint fashion inspired by the Contrastive Predictive Coding (CPC) and Noise Contrastive Estimation (NCE) paradigms which preserve both inter and intra-class relationships. We empirically show that this framework reduces the modality gap significantly and generates a robust and task agnostic joint-embedding space. We outperform existing work on four diverse downstream tasks spanning across seven benchmark cross-modal datasets.
Abstract:Interest in the research areas related to meme propagation and generation has been increasing rapidly in the last couple of years. Meme datasets available online are either specific to a context or contain no class information. Here, we prepare a large-scale dataset of memes with captions and class labels. The dataset consists of 1.1 million meme captions from 128 classes. We also provide reasoning for the existence of broad categories, called "themes" across the meme dataset; each theme consists of multiple meme classes. Our generation system uses a trained state-of-the-art transformer-based model for caption generation by employing an encoder-decoder architecture. We develop a web interface, called Memeify for users to generate memes of their choice, and explain in detail, the working of individual components of the system. We also perform a qualitative evaluation of the generated memes by conducting a user study. A link to the demonstration of the Memeify system is https://youtu.be/P_Tfs0X-czs.