Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jason Sun

Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Jan 30, 2022

Prarthana Bhattacharyya, Chenge Li, Xiaonan Zhao, István Fehérvári, Jason Sun

Figure 1 for Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Figure 2 for Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Figure 3 for Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Figure 4 for Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Abstract:Self-supervision has shown outstanding results for natural language processing, and more recently, for image recognition. Simultaneously, vision transformers and its variants have emerged as a promising and scalable alternative to convolutions on various computer vision tasks. In this paper, we are the first to question if self-supervised vision transformers (SSL-ViTs) can be adapted to two important computer vision tasks in the low-label, high-data regime: few-shot image classification and zero-shot image retrieval. The motivation is to reduce the number of manual annotations required to train a visual embedder, and to produce generalizable and semantically meaningful embeddings. For few-shot image classification we train SSL-ViTs without any supervision, on external data, and use this trained embedder to adapt quickly to novel classes with limited number of labels. For zero-shot image retrieval, we use SSL-ViTs pre-trained on a large dataset without any labels and fine-tune them with several metric learning objectives. Our self-supervised attention representations outperforms the state-of-the-art on several public benchmarks for both tasks, namely miniImageNet and CUB200 for few-shot image classification by up-to 6%-10%, and Stanford Online Products, Cars196 and CUB200 for zero-shot image retrieval by up-to 4%-11%. Code is available at \url{https://github.com/AutoVision-cloud/SSL-ViT-lowlabel-highdata}.

* Accepted to ICASSP-2022

Via

Access Paper or Ask Questions

A neural interlingua for multilingual machine translation

Oct 16, 2018

Yichao Lu, Phillip Keung, Faisal Ladhak, Vikas Bhardwaj, Shaonan Zhang, Jason Sun

Figure 1 for A neural interlingua for multilingual machine translation

Figure 2 for A neural interlingua for multilingual machine translation

Figure 3 for A neural interlingua for multilingual machine translation

Figure 4 for A neural interlingua for multilingual machine translation

Abstract:We incorporate an explicit neural interlingua into a multilingual encoder-decoder neural machine translation (NMT) architecture. We demonstrate that our model learns a language-independent representation by performing direct zero-shot translation (without using pivot translation), and by using the source sentence embeddings to create an English Yelp review classifier that, through the mediation of the neural interlingua, can also classify French and German reviews. Furthermore, we show that, despite using a smaller number of parameters than a pairwise collection of bilingual NMT models, our approach produces comparable BLEU scores for each language pair in WMT15.

* Accepted in WMT 18

Via

Access Paper or Ask Questions

A practical approach to dialogue response generation in closed domains

Mar 28, 2017

Yichao Lu, Phillip Keung, Shaonan Zhang, Jason Sun, Vikas Bhardwaj

Figure 1 for A practical approach to dialogue response generation in closed domains

Figure 2 for A practical approach to dialogue response generation in closed domains

Figure 3 for A practical approach to dialogue response generation in closed domains

Figure 4 for A practical approach to dialogue response generation in closed domains

Abstract:We describe a prototype dialogue response generation model for the customer service domain at Amazon. The model, which is trained in a weakly supervised fashion, measures the similarity between customer questions and agent answers using a dual encoder network, a Siamese-like neural network architecture. Answer templates are extracted from embeddings derived from past agent answers, without turn-by-turn annotations. Responses to customer inquiries are generated by selecting the best template from the final set of templates. We show that, in a closed domain like customer service, the selected templates cover $>$70\% of past customer inquiries. Furthermore, the relevance of the model-selected templates is significantly higher than templates selected by a standard tf-idf baseline.

Via

Access Paper or Ask Questions