Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaonan Zhao

Semantic Embedded Deep Neural Network: A Generic Approach to Boost Multi-Label Image Classification Performance

May 15, 2023

Xin Shen, Xiaonan Zhao, Rui Luo

Figure 1 for Semantic Embedded Deep Neural Network: A Generic Approach to Boost Multi-Label Image Classification Performance

Figure 2 for Semantic Embedded Deep Neural Network: A Generic Approach to Boost Multi-Label Image Classification Performance

Figure 3 for Semantic Embedded Deep Neural Network: A Generic Approach to Boost Multi-Label Image Classification Performance

Figure 4 for Semantic Embedded Deep Neural Network: A Generic Approach to Boost Multi-Label Image Classification Performance

Abstract:Fine-grained multi-label classification models have broad applications in Amazon production features, such as visual based label predictions ranging from fashion attribute detection to brand recognition. One challenge to achieve satisfactory performance for those classification tasks in real world is the wild visual background signal that contains irrelevant pixels which confuses model to focus onto the region of interest and make prediction upon the specific region. In this paper, we introduce a generic semantic-embedding deep neural network to apply the spatial awareness semantic feature incorporating a channel-wise attention based model to leverage the localization guidance to boost model performance for multi-label prediction. We observed an Avg.relative improvement of 15.27% in terms of AUC score across all labels compared to the baseline approach. Core experiment and ablation studies involve multi-label fashion attribute classification performed on Instagram fashion apparels' image. We compared the model performances among our approach, baseline approach, and 3 alternative approaches to leverage semantic features. Results show favorable performance for our approach.

Via

Access Paper or Ask Questions

Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Jan 30, 2022

Prarthana Bhattacharyya, Chenge Li, Xiaonan Zhao, István Fehérvári, Jason Sun

Figure 1 for Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Figure 2 for Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Figure 3 for Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Figure 4 for Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Abstract:Self-supervision has shown outstanding results for natural language processing, and more recently, for image recognition. Simultaneously, vision transformers and its variants have emerged as a promising and scalable alternative to convolutions on various computer vision tasks. In this paper, we are the first to question if self-supervised vision transformers (SSL-ViTs) can be adapted to two important computer vision tasks in the low-label, high-data regime: few-shot image classification and zero-shot image retrieval. The motivation is to reduce the number of manual annotations required to train a visual embedder, and to produce generalizable and semantically meaningful embeddings. For few-shot image classification we train SSL-ViTs without any supervision, on external data, and use this trained embedder to adapt quickly to novel classes with limited number of labels. For zero-shot image retrieval, we use SSL-ViTs pre-trained on a large dataset without any labels and fine-tune them with several metric learning objectives. Our self-supervised attention representations outperforms the state-of-the-art on several public benchmarks for both tasks, namely miniImageNet and CUB200 for few-shot image classification by up-to 6%-10%, and Stanford Online Products, Cars196 and CUB200 for zero-shot image retrieval by up-to 4%-11%. Code is available at \url{https://github.com/AutoVision-cloud/SSL-ViT-lowlabel-highdata}.

* Accepted to ICASSP-2022

Via

Access Paper or Ask Questions

A weakly supervised adaptive triplet loss for deep metric learning

Sep 27, 2019

Xiaonan Zhao, Huan Qi, Rui Luo, Larry Davis

Figure 1 for A weakly supervised adaptive triplet loss for deep metric learning

Figure 2 for A weakly supervised adaptive triplet loss for deep metric learning

Figure 3 for A weakly supervised adaptive triplet loss for deep metric learning

Abstract:We address the problem of distance metric learning in visual similarity search, defined as learning an image embedding model which projects images into Euclidean space where semantically and visually similar images are closer and dissimilar images are further from one another. We present a weakly supervised adaptive triplet loss (ATL) capable of capturing fine-grained semantic similarity that encourages the learned image embedding models to generalize well on cross-domain data. The method uses weakly labeled product description data to implicitly determine fine grained semantic classes, avoiding the need to annotate large amounts of training data. We evaluate on the Amazon fashion retrieval benchmark and DeepFashion in-shop retrieval data. The method boosts the performance of triplet loss baseline by 10.6% on cross-domain data and out-performs the state-of-art model on all evaluation metrics.

* 4 pages, ICCV Fashion Workshop

Via

Access Paper or Ask Questions