Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenhui Li

Domain Adaptation from Generated Multi-Weather Images for Unsupervised Maritime Object Classification

Jan 26, 2025

Dan Song, Shumeng Huo, Wenhui Li, Lanjun Wang, Chao Xue, An-An Liu

Abstract:The classification and recognition of maritime objects are crucial for enhancing maritime safety, monitoring, and intelligent sea environment prediction. However, existing unsupervised methods for maritime object classification often struggle with the long-tail data distributions in both object categories and weather conditions. In this paper, we construct a dataset named AIMO produced by large-scale generative models with diverse weather conditions and balanced object categories, and collect a dataset named RMO with real-world images where long-tail issue exists. We propose a novel domain adaptation approach that leverages AIMO (source domain) to address the problem of limited labeled data, unbalanced distribution and domain shift in RMO (target domain), and enhance the generalization of source features with the Vision-Language Models such as CLIP. Experimental results shows that the proposed method significantly improves the classification accuracy, particularly for samples within rare object categories and weather conditions. Datasets and codes will be publicly available at https://github.com/honoria0204/AIMO.

Via

Access Paper or Ask Questions

Towards Deconfounded Image-Text Matching with Causal Inference

Aug 22, 2024

Wenhui Li, Xinqi Su, Dan Song, Lanjun Wang, Kun Zhang, An-An Liu

Figure 1 for Towards Deconfounded Image-Text Matching with Causal Inference

Figure 2 for Towards Deconfounded Image-Text Matching with Causal Inference

Figure 3 for Towards Deconfounded Image-Text Matching with Causal Inference

Figure 4 for Towards Deconfounded Image-Text Matching with Causal Inference

Abstract:Prior image-text matching methods have shown remarkable performance on many benchmark datasets, but most of them overlook the bias in the dataset, which exists in intra-modal and inter-modal, and tend to learn the spurious correlations that extremely degrade the generalization ability of the model. Furthermore, these methods often incorporate biased external knowledge from large-scale datasets as prior knowledge into image-text matching model, which is inevitable to force model further learn biased associations. To address above limitations, this paper firstly utilizes Structural Causal Models (SCMs) to illustrate how intra- and inter-modal confounders damage the image-text matching. Then, we employ backdoor adjustment to propose an innovative Deconfounded Causal Inference Network (DCIN) for image-text matching task. DCIN (1) decomposes the intra- and inter-modal confounders and incorporates them into the encoding stage of visual and textual features, effectively eliminating the spurious correlations during image-text matching, and (2) uses causal inference to mitigate biases of external knowledge. Consequently, the model can learn causality instead of spurious correlations caused by dataset bias. Extensive experiments on two well-known benchmark datasets, i.e., Flickr30K and MSCOCO, demonstrate the superiority of our proposed method.

* 2023/10/26,Proceedings of the 31st ACM International Conference on Multimedia,6264-6273
* ACM MM

Via

Access Paper or Ask Questions

PRG: Prompt-Based Distillation Without Annotation via Proxy Relational Graph

Aug 22, 2024

Yijin Xu, Jialun Liu, Hualiang Wei, Wenhui Li

Abstract:In this paper, we propose a new distillation method for extracting knowledge from Large Foundation Models (LFM) into lightweight models, introducing a novel supervision mode that does not require manually annotated data. While LFMs exhibit exceptional zero-shot classification abilities across datasets, relying solely on LFM-generated embeddings for distillation poses two main challenges: LFM's task-irrelevant knowledge and the high density of features. The transfer of task-irrelevant knowledge could compromise the student model's discriminative capabilities, and the high density of features within target domains obstructs the extraction of discriminative knowledge essential for the task. To address this issue, we introduce the Proxy Relational Graph (PRG) method. We initially extract task-relevant knowledge from LFMs by calculating a weighted average of logits obtained through text prompt embeddings. Then we construct sample-class proxy graphs for LFM and student models, respectively, to model the correlation between samples and class proxies. Then, we achieve the distillation of selective knowledge by aligning the relational graphs produced by both the LFM and the student model. Specifically, the distillation from LFM to the student model is achieved through two types of alignment: 1) aligning the sample nodes produced by the student model with those produced by the LFM, and 2) aligning the edge relationships in the student model's graph with those in the LFM's graph. Our experimental results validate the effectiveness of PRG, demonstrating its ability to leverage the extensive knowledge base of LFMs while skillfully circumventing their inherent limitations in focused learning scenarios. Notably, in our annotation-free framework, PRG achieves an accuracy of 76.23\% (T: 77.9\%) on CIFAR-100 and 72.44\% (T: 75.3\%) on the ImageNet-1K.

Via

Access Paper or Ask Questions

DAAD: Dynamic Analysis and Adaptive Discriminator for Fake News Detection

Aug 20, 2024

Xinqi Su, Yawen Cui, Ajian Liu, Xun Lin, Yuhao Wang, Haochen Liang, Wenhui Li, Zitong Yu

Abstract:In current web environment, fake news spreads rapidly across online social networks, posing serious threats to society. Existing multimodal fake news detection (MFND) methods can be classified into knowledge-based and semantic-based approaches. However, these methods are overly dependent on human expertise and feedback, lacking flexibility. To address this challenge, we propose a Dynamic Analysis and Adaptive Discriminator (DAAD) approach for fake news detection. For knowledge-based methods, we introduce the Monte Carlo Tree Search (MCTS) algorithm to leverage the self-reflective capabilities of large language models (LLMs) for prompt optimization, providing richer, domain-specific details and guidance to the LLMs, while enabling more flexible integration of LLM comment on news content. For semantic-based methods, we define four typical deceit patterns: emotional exaggeration, logical inconsistency, image manipulation, and semantic inconsistency, to reveal the mechanisms behind fake news creation. To detect these patterns, we carefully design four discriminators and expand them in depth and breadth, using the soft-routing mechanism to explore optimal detection models. Experimental results on three real-world datasets demonstrate the superiority of our approach. The code will be available at: https://github.com/SuXinqi/DAAD.

Via

Access Paper or Ask Questions

MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition

Nov 30, 2023

Dan Song, Xinwei Fu, Weizhi Nie, Wenhui Li, Anan Liu

Figure 1 for MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition

Figure 2 for MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition

Figure 3 for MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition

Figure 4 for MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition

Abstract:Large-scale pre-trained models have demonstrated impressive performance in vision and language tasks within open-world scenarios. Due to the lack of comparable pre-trained models for 3D shapes, recent methods utilize language-image pre-training to realize zero-shot 3D shape recognition. However, due to the modality gap, pretrained language-image models are not confident enough in the generalization to 3D shape recognition. Consequently, this paper aims to improve the confidence with view selection and hierarchical prompts. Leveraging the CLIP model as an example, we employ view selection on the vision side by identifying views with high prediction confidence from multiple rendered views of a 3D shape. On the textual side, the strategy of hierarchical prompts is proposed for the first time. The first layer prompts several classification candidates with traditional class-level descriptions, while the second layer refines the prediction based on function-level descriptions or further distinctions between the candidates. Remarkably, without the need for additional training, our proposed method achieves impressive zero-shot 3D classification accuracies of 84.44\%, 91.51\%, and 66.17\% on ModelNet40, ModelNet10, and ShapeNet Core55, respectively. Furthermore, we will make the code publicly available to facilitate reproducibility and further research in this area.

Via

Access Paper or Ask Questions

Mx2M: Masked Cross-Modality Modeling in Domain Adaptation for 3D Semantic Segmentation

Jul 09, 2023

Boxiang Zhang, Zunran Wang, Yonggen Ling, Yuanyuan Guan, Shenghao Zhang, Wenhui Li

Abstract:Existing methods of cross-modal domain adaptation for 3D semantic segmentation predict results only via 2D-3D complementarity that is obtained by cross-modal feature matching. However, as lacking supervision in the target domain, the complementarity is not always reliable. The results are not ideal when the domain gap is large. To solve the problem of lacking supervision, we introduce masked modeling into this task and propose a method Mx2M, which utilizes masked cross-modality modeling to reduce the large domain gap. Our Mx2M contains two components. One is the core solution, cross-modal removal and prediction (xMRP), which makes the Mx2M adapt to various scenarios and provides cross-modal self-supervision. The other is a new way of cross-modal feature matching, the dynamic cross-modal filter (DxMF) that ensures the whole method dynamically uses more suitable 2D-3D complementarity. Evaluation of the Mx2M on three DA scenarios, including Day/Night, USA/Singapore, and A2D2/SemanticKITTI, brings large improvements over previous methods on many metrics.

Via

Access Paper or Ask Questions

Memory-based Jitter: Improving Visual Recognition on Long-tailed Data with Diversity In Memory

Aug 28, 2020

Jialun Liu, Jingwei Zhang, Wenhui Li, Chi Zhang, Yifan Sun

Figure 1 for Memory-based Jitter: Improving Visual Recognition on Long-tailed Data with Diversity In Memory

Figure 2 for Memory-based Jitter: Improving Visual Recognition on Long-tailed Data with Diversity In Memory

Figure 3 for Memory-based Jitter: Improving Visual Recognition on Long-tailed Data with Diversity In Memory

Figure 4 for Memory-based Jitter: Improving Visual Recognition on Long-tailed Data with Diversity In Memory

Abstract:This paper considers deep visual recognition on long-tailed data, with the majority categories only occupying relatively few samples. The tail categories are prone to lack of within-class diversity, which compromises the representative ability of the learned visual concepts. A radical solution is to augment the tail categories with higher diversity. To this end, we introduce a simple and reliable method named Memory-based Jitter (MBJ) to gain extra diversity for the tail data. We observe that the deep model keeps on jittering from one historical edition to another, even when it already approaches convergence. The ``jitter'' means the small variations between historical models. We argue that such jitter largely originates from the within-class diversity of the overall data and thus encodes the within-class distribution pattern. To utilize such jitter for tail data augmentation, we store the jitter among historical models into a memory bank and get the so-called Memory-based Jitter. With slight modifications, MBJ is applicable for two fundamental visual recognition tasks, \emph{i.e.}, image classification and deep metric learning (on long-tailed data). On image classification, MBJ collects the historical embeddings to learn an accurate classifier. In contrast, on deep metric learning, it collects the historical prototypes of each class to learn a robust deep embedding. Under both scenarios, MBJ enforces higher concentration on tail classes, so as to compensate for their lack of diversity. Extensive experiments on three long-tailed classification benchmarks and two deep metric learning benchmarks (person re-identification, in particular) demonstrate the significant improvement. Moreover, the achieved performance are on par with the state-of-the-art on both tasks.

Via

Access Paper or Ask Questions

Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective

Feb 26, 2020

Jialun Liu, Yifan Sun, Chuchu Han, Zhaopeng Dou, Wenhui Li

Figure 1 for Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective

Figure 2 for Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective

Figure 3 for Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective

Figure 4 for Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective

Abstract:This paper considers learning deep features from long-tailed data. We observe that in the deep feature space, the head classes and the tail classes present different distribution patterns. The head classes have a relatively large spatial span, while the tail classes have significantly small spatial span, due to the lack of intra-class diversity. This uneven distribution between head and tail classes distorts the overall feature space, which compromises the discriminative ability of the learned features. Intuitively, we seek to expand the distribution of the tail classes by transferring from the head classes, so as to alleviate the distortion of the feature space. To this end, we propose to construct each feature into a "feature cloud". If a sample belongs to a tail class, the corresponding feature cloud will have relatively large distribution range, in compensation to its lack of diversity. It allows each tail sample to push the samples from other classes far away, recovering the intra-class diversity of tail classes. Extensive experimental evaluations on person re-identification and face recognition tasks confirm the effectiveness of our method.

Via

Access Paper or Ask Questions

Saliency guided deep network for weakly-supervised image segmentation

Oct 19, 2018

Fengdong Sun, Wenhui Li

Figure 1 for Saliency guided deep network for weakly-supervised image segmentation

Figure 2 for Saliency guided deep network for weakly-supervised image segmentation

Figure 3 for Saliency guided deep network for weakly-supervised image segmentation

Figure 4 for Saliency guided deep network for weakly-supervised image segmentation

Abstract:Weakly-supervised image segmentation is an important task in computer vision. A key problem is how to obtain high quality objects location from image-level category. Classification activation mapping is a common method which can be used to generate high-precise object location cues. However these location cues are generally very sparse and small such that they can not provide effective information for image segmentation. In this paper, we propose a saliency guided image segmentation network to resolve this problem. We employ a self-attention saliency method to generate subtle saliency maps, and render the location cues grow as seeds by seeded region growing method to expand pixel-level labels extent. In the process of seeds growing, we use the saliency values to weight the similarity between pixels to control the growing. Therefore saliency information could help generate discriminative object regions, and the effects of wrong salient pixels can be suppressed efficiently. Experimental results on a common segmentation dataset PASCAL VOC2012 demonstrate the effectiveness of our method.

Via

Access Paper or Ask Questions

Self-Attention Recurrent Network for Saliency Detection

Aug 05, 2018

Fengdong Sun, Wenhui Li, Yuanyuan Guan

Figure 1 for Self-Attention Recurrent Network for Saliency Detection

Figure 2 for Self-Attention Recurrent Network for Saliency Detection

Figure 3 for Self-Attention Recurrent Network for Saliency Detection

Figure 4 for Self-Attention Recurrent Network for Saliency Detection

Abstract:Feature maps in deep neural network generally contain different semantics. Existing methods often omit their characteristics that may lead to sub-optimal results. In this paper, we propose a novel end-to-end deep saliency network which could effectively utilize multi-scale feature maps according to their characteristics. Shallow layers often contain more local information, and deep layers have advantages in global semantics. Therefore, the network generates elaborate saliency maps by enhancing local and global information of feature maps in different layers. On one hand, local information of shallow layers is enhanced by a recurrent structure which shared convolution kernel at different time steps. On the other hand, global information of deep layers is utilized by a self-attention module, which generates different attention weights for salient objects and backgrounds thus achieve better performance. Experimental results on four widely used datasets demonstrate that our method has advantages in performance over existing algorithms.

Via

Access Paper or Ask Questions