Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mustafa Sercan Amac

MaskSplit: Self-supervised Meta-learning for Few-shot Semantic Segmentation

Nov 03, 2021

Mustafa Sercan Amac, Ahmet Sencan, Orhun Bugra Baran, Nazli Ikizler-Cinbis, Ramazan Gokberk Cinbis

Figure 1 for MaskSplit: Self-supervised Meta-learning for Few-shot Semantic Segmentation

Figure 2 for MaskSplit: Self-supervised Meta-learning for Few-shot Semantic Segmentation

Figure 3 for MaskSplit: Self-supervised Meta-learning for Few-shot Semantic Segmentation

Figure 4 for MaskSplit: Self-supervised Meta-learning for Few-shot Semantic Segmentation

Abstract:Just like other few-shot learning problems, few-shot segmentation aims to minimize the need for manual annotation, which is particularly costly in segmentation tasks. Even though the few-shot setting reduces this cost for novel test classes, there is still a need to annotate the training data. To alleviate this need, we propose a self-supervised training approach for learning few-shot segmentation models. We first use unsupervised saliency estimation to obtain pseudo-masks on images. We then train a simple prototype based model over different splits of pseudo masks and augmentations of images. Our extensive experiments show that the proposed approach achieves promising results, highlighting the potential of self-supervised training. To the best of our knowledge this is the first work that addresses unsupervised few-shot segmentation problem on natural images.

* To appear at WACV 2022, 11 pages, 5 figures

Via

Access Paper or Ask Questions

Cross-lingual Visual Pre-training for Multimodal Machine Translation

Jan 25, 2021

Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac, Pranava Madhyastha, Erkut Erdem, Aykut Erdem, Lucia Specia

Figure 1 for Cross-lingual Visual Pre-training for Multimodal Machine Translation

Figure 2 for Cross-lingual Visual Pre-training for Multimodal Machine Translation

Figure 3 for Cross-lingual Visual Pre-training for Multimodal Machine Translation

Figure 4 for Cross-lingual Visual Pre-training for Multimodal Machine Translation

Abstract:Pre-trained language models have been shown to improve performance in many natural language tasks substantially. Although the early focus of such models was single language pre-training, recent advances have resulted in cross-lingual and visual pre-training methods. In this paper, we combine these two approaches to learn visually-grounded cross-lingual representations. Specifically, we extend the translation language modelling (Lample and Conneau, 2019) with masked region classification and perform pre-training with three-way parallel vision & language corpora. We show that when fine-tuned for multimodal machine translation, these models obtain state-of-the-art performance. We also provide qualitative insights into the usefulness of the learned grounded representations.

* Accepted to EACL 2021 (Camera-ready version)

Via

Access Paper or Ask Questions

Procedural Reasoning Networks for Understanding Multimodal Procedures

Sep 19, 2019

Mustafa Sercan Amac, Semih Yagcioglu, Aykut Erdem, Erkut Erdem

Figure 1 for Procedural Reasoning Networks for Understanding Multimodal Procedures

Figure 2 for Procedural Reasoning Networks for Understanding Multimodal Procedures

Figure 3 for Procedural Reasoning Networks for Understanding Multimodal Procedures

Figure 4 for Procedural Reasoning Networks for Understanding Multimodal Procedures

Abstract:This paper addresses the problem of comprehending procedural commonsense knowledge. This is a challenging task as it requires identifying key entities, keeping track of their state changes, and understanding temporal and causal relations. Contrary to most of the previous work, in this study, we do not rely on strong inductive bias and explore the question of how multimodality can be exploited to provide a complementary semantic signal. Towards this end, we introduce a new entity-aware neural comprehension model augmented with external relational memory units. Our model learns to dynamically update entity states in relation to each other while reading the text instructions. Our experimental analysis on the visual reasoning tasks in the recently proposed RecipeQA dataset reveals that our approach improves the accuracy of the previously reported models by a large margin. Moreover, we find that our model learns effective dynamic representations of entities even though we do not use any supervision at the level of entity states.

* Accepted to CoNLL 2019. The project website with code and demo is available at https://hucvl.github.io/prn/

Via

Access Paper or Ask Questions