Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ying Zheng

Fuzzy-aware Loss for Source-free Domain Adaptation in Visual Emotion Recognition

Jan 26, 2025

Ying Zheng, Yiyi Zhang, Yi Wang, Lap-Pui Chau

Abstract:Source-free domain adaptation in visual emotion recognition (SFDA-VER) is a highly challenging task that requires adapting VER models to the target domain without relying on source data, which is of great significance for data privacy protection. However, due to the unignorable disparities between visual emotion data and traditional image classification data, existing SFDA methods perform poorly on this task. In this paper, we investigate the SFDA-VER task from a fuzzy perspective and identify two key issues: fuzzy emotion labels and fuzzy pseudo-labels. These issues arise from the inherent uncertainty of emotion annotations and the potential mispredictions in pseudo-labels. To address these issues, we propose a novel fuzzy-aware loss (FAL) to enable the VER model to better learn and adapt to new domains under fuzzy labels. Specifically, FAL modifies the standard cross entropy loss and focuses on adjusting the losses of non-predicted categories, which prevents a large number of uncertain or incorrect predictions from overwhelming the VER model during adaptation. In addition, we provide a theoretical analysis of FAL and prove its robustness in handling the noise in generated pseudo-labels. Extensive experiments on 26 domain adaptation sub-tasks across three benchmark datasets demonstrate the effectiveness of our method.

Via

Access Paper or Ask Questions

A Survey of Embodied Learning for Object-Centric Robotic Manipulation

Aug 21, 2024

Ying Zheng, Lei Yao, Yuejiao Su, Yi Zhang, Yi Wang, Sicheng Zhao, Yiyi Zhang, Lap-Pui Chau

Figure 1 for A Survey of Embodied Learning for Object-Centric Robotic Manipulation

Figure 2 for A Survey of Embodied Learning for Object-Centric Robotic Manipulation

Figure 3 for A Survey of Embodied Learning for Object-Centric Robotic Manipulation

Figure 4 for A Survey of Embodied Learning for Object-Centric Robotic Manipulation

Abstract:Embodied learning for object-centric robotic manipulation is a rapidly developing and challenging area in embodied AI. It is crucial for advancing next-generation intelligent robots and has garnered significant interest recently. Unlike data-driven machine learning methods, embodied learning focuses on robot learning through physical interaction with the environment and perceptual feedback, making it especially suitable for robotic manipulation. In this paper, we provide a comprehensive survey of the latest advancements in this field and categorize the existing work into three main branches: 1) Embodied perceptual learning, which aims to predict object pose and affordance through various data representations; 2) Embodied policy learning, which focuses on generating optimal robotic decisions using methods such as reinforcement learning and imitation learning; 3) Embodied task-oriented learning, designed to optimize the robot's performance based on the characteristics of different tasks in object grasping and manipulation. In addition, we offer an overview and discussion of public datasets, evaluation metrics, representative applications, current challenges, and potential future research directions. A project associated with this survey has been established at https://github.com/RayYoh/OCRM_survey.

Via

Access Paper or Ask Questions

Leaf Cultivar Identification via Prototype-enhanced Learning

May 05, 2023

Yiyi Zhang, Zhiwen Ying, Ying Zheng, Cuiling Wu, Nannan Li, Jun Wang, Xianzhong Feng, Xiaogang Xu

Abstract:Plant leaf identification is crucial for biodiversity protection and conservation and has gradually attracted the attention of academia in recent years. Due to the high similarity among different varieties, leaf cultivar recognition is also considered to be an ultra-fine-grained visual classification (UFGVC) task, which is facing a huge challenge. In practice, an instance may be related to multiple varieties to varying degrees, especially in the UFGVC datasets. However, deep learning methods trained on one-hot labels fail to reflect patterns shared across categories and thus perform poorly on this task. To address this issue, we generate soft targets integrated with inter-class similarity information. Specifically, we continuously update the prototypical features for each category and then capture the similarity scores between instances and prototypes accordingly. Original one-hot labels and the similarity scores are incorporated to yield enhanced labels. Prototype-enhanced soft labels not only contain original one-hot label information, but also introduce rich inter-category semantic association information, thus providing more effective supervision for deep model training. Extensive experimental results on public datasets show that our method can significantly improve the performance on the UFGVC task of leaf cultivar identification.

Via

Access Paper or Ask Questions

How Well Do Self-Supervised Methods Perform in Cross-Domain Few-Shot Learning?

Feb 18, 2022

Yiyi Zhang, Ying Zheng, Xiaogang Xu, Jun Wang

Abstract:Cross-domain few-shot learning (CDFSL) remains a largely unsolved problem in the area of computer vision, while self-supervised learning presents a promising solution. Both learning methods attempt to alleviate the dependency of deep networks on the requirement of large-scale labeled data. Although self-supervised methods have recently advanced dramatically, their utility on CDFSL is relatively unexplored. In this paper, we investigate the role of self-supervised representation learning in the context of CDFSL via a thorough evaluation of existing methods. It comes as a surprise that even with shallow architectures or small training datasets, self-supervised methods can perform favorably compared to the existing SOTA methods. Nevertheless, no single self-supervised approach dominates all datasets indicating that existing self-supervised methods are not universally applicable. In addition, we find that representations extracted from self-supervised methods exhibit stronger robustness than the supervised method. Intriguingly, whether self-supervised representations perform well on the source domain has little correlation with their applicability on the target domain. As part of our study, we conduct an objective measurement of the performance for six kinds of representative classifiers. The results suggest Prototypical Classifier as the standard evaluation recipe for CDFSL.

Via

Access Paper or Ask Questions

Adaptive Semantic-Visual Tree for Hierarchical Embeddings

Mar 08, 2020

Shuo Yang, Wei Yu, Ying Zheng, Hongxun Yao, Tao Mei

Figure 1 for Adaptive Semantic-Visual Tree for Hierarchical Embeddings

Figure 2 for Adaptive Semantic-Visual Tree for Hierarchical Embeddings

Figure 3 for Adaptive Semantic-Visual Tree for Hierarchical Embeddings

Figure 4 for Adaptive Semantic-Visual Tree for Hierarchical Embeddings

Abstract:Merchandise categories inherently form a semantic hierarchy with different levels of concept abstraction, especially for fine-grained categories. This hierarchy encodes rich correlations among various categories across different levels, which can effectively regularize the semantic space and thus make predictions less ambiguous. However, previous studies of fine-grained image retrieval primarily focus on semantic similarities or visual similarities. In a real application, merely using visual similarity may not satisfy the need of consumers to search merchandise with real-life images, e.g., given a red coat as a query image, we might get a red suit in recall results only based on visual similarity since they are visually similar. But the users actually want a coat rather than suit even the coat is with different color or texture attributes. We introduce this new problem based on photoshopping in real practice. That's why semantic information are integrated to regularize the margins to make "semantic" prior to "visual". To solve this new problem, we propose a hierarchical adaptive semantic-visual tree (ASVT) to depict the architecture of merchandise categories, which evaluates semantic similarities between different semantic levels and visual similarities within the same semantic class simultaneously. The semantic information satisfies the demand of consumers for similar merchandise with the query while the visual information optimizes the correlations within the semantic class. At each level, we set different margins based on the semantic hierarchy and incorporate them as prior information to learn a fine-grained feature embedding. To evaluate our framework, we propose a new dataset named JDProduct, with hierarchical labels collected from actual image queries and official merchandise images on an online shopping application. Extensive experimental results on the public CARS196 and CUB-

* The 27th ACM Multimedia (2019) 2097-2105

Via

Access Paper or Ask Questions

Sketch-Specific Data Augmentation for Freehand Sketch Recognition

Oct 14, 2019

Ying Zheng, Hongxun Yao, Xiaoshuai Sun, Shengping Zhang, Sicheng Zhao, Fatih Porikli

Figure 1 for Sketch-Specific Data Augmentation for Freehand Sketch Recognition

Figure 2 for Sketch-Specific Data Augmentation for Freehand Sketch Recognition

Figure 3 for Sketch-Specific Data Augmentation for Freehand Sketch Recognition

Figure 4 for Sketch-Specific Data Augmentation for Freehand Sketch Recognition

Abstract:Sketch recognition remains a significant challenge due to the limited training data and the substantial intra-class variance of freehand sketches for the same object. Conventional methods for this task often rely on the availability of the temporal order of sketch strokes, additional cues acquired from different modalities and supervised augmentation of sketch datasets with real images, which also limit the applicability and feasibility of these methods in real scenarios. In this paper, we propose a novel sketch-specific data augmentation (SSDA) method that leverages the quantity and quality of the sketches automatically. From the aspect of quantity, we introduce a Bezier pivot based deformation (BPD) strategy to enrich the training data. Towards quality improvement, we present a mean stroke reconstruction (MSR) approach to generate a set of novel types of sketches with smaller intra-class variances. Both of these solutions are unrestricted from any multi-source data and temporal cues of sketches. Furthermore, we show that some recent deep convolutional neural network models that are trained on generic classes of real images can be better choices than most of the elaborate architectures that are designed explicitly for sketch recognition. As SSDA can be integrated with any convolutional neural networks, it has a distinct advantage over the existing methods. Our extensive experimental evaluations demonstrate that the proposed method achieves state-of-the-art results (84.27%) on the TU-Berlin dataset, outperforming the human performance by a remarkable 11.17% increase. We also present a new benchmark named Sketchy-R to facilitate future research in sketch recognition. Finally, more experiments show the practical value of our approach to the task of sketch-based image retrieval.

Via

Access Paper or Ask Questions

Deep Semantic Parsing of Freehand Sketches with Homogeneous Transformation, Soft-Weighted Loss, and Staged Learning

Oct 14, 2019

Ying Zheng, Hongxun Yao, Xiaoshuai Sun

Figure 1 for Deep Semantic Parsing of Freehand Sketches with Homogeneous Transformation, Soft-Weighted Loss, and Staged Learning

Figure 2 for Deep Semantic Parsing of Freehand Sketches with Homogeneous Transformation, Soft-Weighted Loss, and Staged Learning

Figure 3 for Deep Semantic Parsing of Freehand Sketches with Homogeneous Transformation, Soft-Weighted Loss, and Staged Learning

Figure 4 for Deep Semantic Parsing of Freehand Sketches with Homogeneous Transformation, Soft-Weighted Loss, and Staged Learning

Abstract:In this paper, we propose a novel deep framework for part-level semantic parsing of freehand sketches, which makes three main contributions that are experimentally shown to have substantial practical merit. First, we introduce a new idea named homogeneous transformation to address the problem of domain adaptation. For the task of sketch parsing, there is no available data of labeled freehand sketches that can be directly used for model training. An alternative solution is to learn from the existing parsing data of real images, while the domain adaptation is an inevitable problem. Unlike existing methods that utilize the edge maps of real images to approximate freehand sketches, the proposed homogeneous transformation method transforms the data from two different domains into a homogeneous space to minimize the semantic gap. Second, we design a soft-weighted loss function as guidance for the training process, which gives attention to both the ambiguous label boundary and class imbalance. Third, we present a staged learning strategy to improve the parsing performance of the trained model, which takes advantage of the shared information and specific characteristic from different sketch categories. Extensive experimental results demonstrate the effectiveness of these methods. Specifically, to evaluate the generalization ability of our homogeneous transformation method, additional experiments at the task of sketch-based image retrieval are conducted on the QMUL FG-SBIR dataset. By integrating the proposed three methods into a unified framework, our final deep semantic sketch parsing (DeepSSP) model achieves the state-of-the-art on the public SketchParse dataset.

* 11 pages

Via

Access Paper or Ask Questions