Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shizhen Zhao

Granularity Matters in Long-Tail Learning

Oct 21, 2024

Shizhen Zhao, Xin Wen, Jiahui Liu, Chuofan Ma, Chunfeng Yuan, Xiaojuan Qi

Figure 1 for Granularity Matters in Long-Tail Learning

Figure 2 for Granularity Matters in Long-Tail Learning

Figure 3 for Granularity Matters in Long-Tail Learning

Figure 4 for Granularity Matters in Long-Tail Learning

Abstract:Balancing training on long-tail data distributions remains a long-standing challenge in deep learning. While methods such as re-weighting and re-sampling help alleviate the imbalance issue, limited sample diversity continues to hinder models from learning robust and generalizable feature representations, particularly for tail classes. In contrast to existing methods, we offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance. In this paper, we investigate this phenomenon through both quantitative and qualitative studies, showing that increased granularity enhances the generalization of learned features in tail categories. Motivated by these findings, we propose a method to increase dataset granularity through category extrapolation. Specifically, we introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes. This forms the core contribution and insight of our approach. To automate the curation of auxiliary data, we leverage large language models (LLMs) as knowledge bases to search for auxiliary categories and retrieve relevant images through web crawling. To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss that encourages the model to focus on class discrimination within the target dataset. During inference, the classifier weights for auxiliary categories are masked out, leaving only the target class weights for use. Extensive experiments and ablation studies on three standard long-tail benchmarks demonstrate the effectiveness of our approach, notably outperforming strong baseline methods that use the same amount of data. The code will be made publicly available.

Via

Access Paper or Ask Questions

Can OOD Object Detectors Learn from Foundation Models?

Sep 08, 2024

Jiahui Liu, Xin Wen, Shizhen Zhao, Yingxian Chen, Xiaojuan Qi

Figure 1 for Can OOD Object Detectors Learn from Foundation Models?

Figure 2 for Can OOD Object Detectors Learn from Foundation Models?

Figure 3 for Can OOD Object Detectors Learn from Foundation Models?

Figure 4 for Can OOD Object Detectors Learn from Foundation Models?

Abstract:Out-of-distribution (OOD) object detection is a challenging task due to the absence of open-set OOD data. Inspired by recent advancements in text-to-image generative models, such as Stable Diffusion, we study the potential of generative models trained on large-scale open-set data to synthesize OOD samples, thereby enhancing OOD object detection. We introduce SyncOOD, a simple data curation method that capitalizes on the capabilities of large foundation models to automatically extract meaningful OOD data from text-to-image generative models. This offers the model access to open-world knowledge encapsulated within off-the-shelf foundation models. The synthetic OOD samples are then employed to augment the training of a lightweight, plug-and-play OOD detector, thus effectively optimizing the in-distribution (ID)/OOD decision boundaries. Extensive experiments across multiple benchmarks demonstrate that SyncOOD significantly outperforms existing methods, establishing new state-of-the-art performance with minimal synthetic data usage.

* European Conference on Computer Vision (ECCV) 2024
* 19 pages, 4 figures

Via

Access Paper or Ask Questions

Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection

Oct 11, 2022

Shizhen Zhao, Xiaojuan Qi

Figure 1 for Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection

Figure 2 for Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection

Figure 3 for Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection

Figure 4 for Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection

Abstract:Most existing 3D point cloud object detection approaches heavily rely on large amounts of labeled training data. However, the labeling process is costly and time-consuming. This paper considers few-shot 3D point cloud object detection, where only a few annotated samples of novel classes are needed with abundant samples of base classes. To this end, we propose Prototypical VoteNet to recognize and localize novel instances, which incorporates two new modules: Prototypical Vote Module (PVM) and Prototypical Head Module (PHM). Specifically, as the 3D basic geometric structures can be shared among categories, PVM is designed to leverage class-agnostic geometric prototypes, which are learned from base classes, to refine local features of novel categories.Then PHM is proposed to utilize class prototypes to enhance the global feature of each object, facilitating subsequent object localization and classification, which is trained by the episodic training strategy. To evaluate the model in this new setting, we contribute two new benchmark datasets, FS-ScanNet and FS-SUNRGBD. We conduct extensive experiments to demonstrate the effectiveness of Prototypical VoteNet, and our proposed method shows significant and consistent improvements compared to baselines on two benchmark datasets.

* NeurIPS 2022

Via

Access Paper or Ask Questions

Devil's in the Detail: Graph-based Key-point Alignment and Embedding for Person Re-ID

Sep 11, 2020

Xinyang Jiang, Fufu Yu, Yifei Gong, Shizhen Zhao, Xiaowei Guo, Feiyue Huang, Wei-Shi Zheng, Xing Sun

Figure 1 for Devil's in the Detail: Graph-based Key-point Alignment and Embedding for Person Re-ID

Figure 2 for Devil's in the Detail: Graph-based Key-point Alignment and Embedding for Person Re-ID

Figure 3 for Devil's in the Detail: Graph-based Key-point Alignment and Embedding for Person Re-ID

Figure 4 for Devil's in the Detail: Graph-based Key-point Alignment and Embedding for Person Re-ID

Abstract:Although Person Re-Identification has made impressive progress, difficult cases like occlusion, change of view-point and similar clothing still bring great challenges. Besides overall visual features, matching and comparing detailed local information is also essential for tackling these challenges. This paper proposes two key recognition patterns to better utilize the local information of pedestrian images. From the spatial perspective, the model should be able to select and align key-points from the image pairs for comparison (i.e. key-points alignment). From the perspective of feature channels, the feature of a query image should be dynamically adjusted based on the gallery image it needs to match (i.e. conditional feature embedding). Most of the existing methods are unable to satisfy both key-point alignment and conditional feature embedding. By introducing novel techniques including correspondence attention module and discrepancy-based GCN, we propose an end-to-end ReID method that integrates both patterns into a unified framework, called Siamese-GCN. The experiments show that Siamese-GCN achieves state-of-the-art performance on three public datasets.

Via

Access Paper or Ask Questions

Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians

Aug 16, 2020

Shizhen Zhao, Changxin Gao, Jun Zhang, Hao Cheng, Chuchu Han, Xinyang Jiang, Xiaowei Guo, Wei-Shi Zheng, Nong Sang, Xing Sun

Figure 1 for Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians

Figure 2 for Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians

Figure 3 for Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians

Figure 4 for Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians

Abstract:In the conventional person Re-ID setting, it is widely assumed that cropped person images are for each individual. However, in a crowded scene, off-shelf-detectors may generate bounding boxes involving multiple people, where the large proportion of background pedestrians or human occlusion exists. The representation extracted from such cropped images, which contain both the target and the interference pedestrians, might include distractive information. This will lead to wrong retrieval results. To address this problem, this paper presents a novel deep network termed Pedestrian-Interference Suppression Network (PISNet). PISNet leverages a Query-Guided Attention Block (QGAB) to enhance the feature of the target in the gallery, under the guidance of the query. Furthermore, the involving Guidance Reversed Attention Module and the Multi-Person Separation Loss promote QGAB to suppress the interference of other pedestrians. Our method is evaluated on two new pedestrian-interference datasets and the results show that the proposed method performs favorably against existing Re-ID methods.

* Accepted by ECCV 2020

Via

Access Paper or Ask Questions

GTNet: Generative Transfer Network for Zero-Shot Object Detection

Jan 24, 2020

Shizhen Zhao, Changxin Gao, Yuanjie Shao, Lerenhan Li, Changqian Yu, Zhong Ji, Nong Sang

Figure 1 for GTNet: Generative Transfer Network for Zero-Shot Object Detection

Figure 2 for GTNet: Generative Transfer Network for Zero-Shot Object Detection

Figure 3 for GTNet: Generative Transfer Network for Zero-Shot Object Detection

Figure 4 for GTNet: Generative Transfer Network for Zero-Shot Object Detection

Abstract:We propose a Generative Transfer Network (GTNet) for zero shot object detection (ZSD). GTNet consists of an Object Detection Module and a Knowledge Transfer Module. The Object Detection Module can learn large-scale seen domain knowledge. The Knowledge Transfer Module leverages a feature synthesizer to generate unseen class features, which are applied to train a new classification layer for the Object Detection Module. In order to synthesize features for each unseen class with both the intra-class variance and the IoU variance, we design an IoU-Aware Generative Adversarial Network (IoUGAN) as the feature synthesizer, which can be easily integrated into GTNet. Specifically, IoUGAN consists of three unit models: Class Feature Generating Unit (CFU), Foreground Feature Generating Unit (FFU), and Background Feature Generating Unit (BFU). CFU generates unseen features with the intra-class variance conditioned on the class semantic embeddings. FFU and BFU add the IoU variance to the results of CFU, yielding class-specific foreground and background features, respectively. We evaluate our method on three public datasets and the results demonstrate that our method performs favorably against the state-of-the-art ZSD approaches.

* Accepted by AAAI 2020

Via

Access Paper or Ask Questions