Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dan Song

Domain Adaptation from Generated Multi-Weather Images for Unsupervised Maritime Object Classification

Jan 26, 2025

Dan Song, Shumeng Huo, Wenhui Li, Lanjun Wang, Chao Xue, An-An Liu

Abstract:The classification and recognition of maritime objects are crucial for enhancing maritime safety, monitoring, and intelligent sea environment prediction. However, existing unsupervised methods for maritime object classification often struggle with the long-tail data distributions in both object categories and weather conditions. In this paper, we construct a dataset named AIMO produced by large-scale generative models with diverse weather conditions and balanced object categories, and collect a dataset named RMO with real-world images where long-tail issue exists. We propose a novel domain adaptation approach that leverages AIMO (source domain) to address the problem of limited labeled data, unbalanced distribution and domain shift in RMO (target domain), and enhance the generalization of source features with the Vision-Language Models such as CLIP. Experimental results shows that the proposed method significantly improves the classification accuracy, particularly for samples within rare object categories and weather conditions. Datasets and codes will be publicly available at https://github.com/honoria0204/AIMO.

Via

Access Paper or Ask Questions

Towards Deconfounded Image-Text Matching with Causal Inference

Aug 22, 2024

Wenhui Li, Xinqi Su, Dan Song, Lanjun Wang, Kun Zhang, An-An Liu

Figure 1 for Towards Deconfounded Image-Text Matching with Causal Inference

Figure 2 for Towards Deconfounded Image-Text Matching with Causal Inference

Figure 3 for Towards Deconfounded Image-Text Matching with Causal Inference

Figure 4 for Towards Deconfounded Image-Text Matching with Causal Inference

Abstract:Prior image-text matching methods have shown remarkable performance on many benchmark datasets, but most of them overlook the bias in the dataset, which exists in intra-modal and inter-modal, and tend to learn the spurious correlations that extremely degrade the generalization ability of the model. Furthermore, these methods often incorporate biased external knowledge from large-scale datasets as prior knowledge into image-text matching model, which is inevitable to force model further learn biased associations. To address above limitations, this paper firstly utilizes Structural Causal Models (SCMs) to illustrate how intra- and inter-modal confounders damage the image-text matching. Then, we employ backdoor adjustment to propose an innovative Deconfounded Causal Inference Network (DCIN) for image-text matching task. DCIN (1) decomposes the intra- and inter-modal confounders and incorporates them into the encoding stage of visual and textual features, effectively eliminating the spurious correlations during image-text matching, and (2) uses causal inference to mitigate biases of external knowledge. Consequently, the model can learn causality instead of spurious correlations caused by dataset bias. Extensive experiments on two well-known benchmark datasets, i.e., Flickr30K and MSCOCO, demonstrate the superiority of our proposed method.

* 2023/10/26,Proceedings of the 31st ACM International Conference on Multimedia,6264-6273
* ACM MM

Via

Access Paper or Ask Questions

BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Aug 12, 2024

Xuanpu Zhang, Dan Song, Pengxin Zhan, Qingguo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Anan Liu

Figure 1 for BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Figure 2 for BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Figure 3 for BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Figure 4 for BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Abstract:Image-based virtual try-on is an increasingly popular and important task to generate realistic try-on images of specific person. Existing methods always employ an accurate mask to remove the original garment in the source image, thus achieving realistic synthesized images in simple and conventional try-on scenarios based on powerful diffusion model. Therefore, acquiring suitable mask is vital to the try-on performance of these methods. However, obtaining precise inpainting masks, especially for complex wild try-on data containing diverse foreground occlusions and person poses, is not easy as Figure 1-Top shows. This difficulty often results in poor performance in more practical and challenging real-life scenarios, such as the selfie scene shown in Figure 1-Bottom. To this end, we propose a novel training paradigm combined with an efficient data augmentation method to acquire large-scale unpaired training data from wild scenarios, thereby significantly facilitating the try-on performance of our model without the need for additional inpainting masks. Besides, a try-on localization loss is designed to localize a more accurate try-on area to obtain more reasonable try-on results. It is noted that our method only needs the reference cloth image, source pose image and source person image as input, which is more cost-effective and user-friendly compared to existing methods. Extensive qualitative and quantitative experiments have demonstrated superior performance in wild scenarios with such a low-demand input.

Via

Access Paper or Ask Questions

The Rise of Artificial Intelligence in Educational Measurement: Opportunities and Ethical Challenges

Jun 27, 2024

Okan Bulut, Maggie Beiting-Parrish, Jodi M. Casabianca, Sharon C. Slater, Hong Jiao, Dan Song, Christopher M. Ormerod, Deborah Gbemisola Fabiyi, Rodica Ivan, Cole Walsh(+7 more)

Figure 1 for The Rise of Artificial Intelligence in Educational Measurement: Opportunities and Ethical Challenges

Abstract:The integration of artificial intelligence (AI) in educational measurement has revolutionized assessment methods, enabling automated scoring, rapid content analysis, and personalized feedback through machine learning and natural language processing. These advancements provide timely, consistent feedback and valuable insights into student performance, thereby enhancing the assessment experience. However, the deployment of AI in education also raises significant ethical concerns regarding validity, reliability, transparency, fairness, and equity. Issues such as algorithmic bias and the opacity of AI decision-making processes pose risks of perpetuating inequalities and affecting assessment outcomes. Responding to these concerns, various stakeholders, including educators, policymakers, and organizations, have developed guidelines to ensure ethical AI use in education. The National Council of Measurement in Education's Special Interest Group on AI in Measurement and Education (AIME) also focuses on establishing ethical standards and advancing research in this area. In this paper, a diverse group of AIME members examines the ethical implications of AI-powered tools in educational measurement, explores significant challenges such as automation bias and environmental impact, and proposes solutions to ensure AI's responsible and effective use in education.

* 59 pages, 3 figures, a joint work of the Special Interest Group on Artificial Intelligence in Measurement and Education (AIME) from the National Council of Measurement in Education (NCME)

Via

Access Paper or Ask Questions

Better Fit: Accommodate Variations in Clothing Types for Virtual Try-on

Mar 13, 2024

Xuanpu Zhang, Dan Song, Pengxin Zhan, Qingguo Chen, Kuilong Liu, Anan Liu

Figure 1 for Better Fit: Accommodate Variations in Clothing Types for Virtual Try-on

Figure 2 for Better Fit: Accommodate Variations in Clothing Types for Virtual Try-on

Figure 3 for Better Fit: Accommodate Variations in Clothing Types for Virtual Try-on

Figure 4 for Better Fit: Accommodate Variations in Clothing Types for Virtual Try-on

Abstract:Image-based virtual try-on aims to transfer target in-shop clothing to a dressed model image, the objectives of which are totally taking off original clothing while preserving the contents outside of the try-on area, naturally wearing target clothing and correctly inpainting the gap between target clothing and original clothing. Tremendous efforts have been made to facilitate this popular research area, but cannot keep the type of target clothing with the try-on area affected by original clothing. In this paper, we focus on the unpaired virtual try-on situation where target clothing and original clothing on the model are different, i.e., the practical scenario. To break the correlation between the try-on area and the original clothing and make the model learn the correct information to inpaint, we propose an adaptive mask training paradigm that dynamically adjusts training masks. It not only improves the alignment and fit of clothing but also significantly enhances the fidelity of virtual try-on experience. Furthermore, we for the first time propose two metrics for unpaired try-on evaluation, the Semantic-Densepose-Ratio (SDR) and Skeleton-LPIPS (S-LPIPS), to evaluate the correctness of clothing type and the accuracy of clothing texture. For unpaired try-on validation, we construct a comprehensive cross-try-on benchmark (Cross-27) with distinctive clothing items and model physiques, covering a broad try-on scenarios. Experiments demonstrate the effectiveness of the proposed methods, contributing to the advancement of virtual try-on technology and offering new insights and tools for future research in the field. The code, model and benchmark will be publicly released.

Via

Access Paper or Ask Questions

CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

Nov 30, 2023

Jianhao Zeng, Dan Song, Weizhi Nie, Hongshuo Tian, Tongtong Wang, Anan Liu

Abstract:Image-based virtual try-on enables users to virtually try on different garments by altering original clothes in their photographs. Generative Adversarial Networks (GANs) dominate the research field in image-based virtual try-on, but have not resolved problems such as unnatural deformation of garments and the blurry generation quality. Recently, diffusion models have emerged with surprising performance across various image generation tasks. While the generative quality of diffusion models is impressive, achieving controllability poses a significant challenge when applying it to virtual try-on tasks and multiple denoising iterations limit its potential for real-time applications. In this paper, we propose Controllable Accelerated virtual Try-on with Diffusion Model called CAT-DM. To enhance the controllability, a basic diffusion-based virtual try-on network is designed, which utilizes ControlNet to introduce additional control conditions and improves the feature extraction of garment images. In terms of acceleration, CAT-DM initiates a reverse denoising process with an implicit distribution generated by a pre-trained GAN-based model. Compared with previous try-on methods based on diffusion models, CAT-DM not only retains the pattern and texture details of the in-shop garment but also reduces the sampling steps without compromising generation quality. Extensive experiments demonstrate the superiority of CAT-DM against both GAN-based and diffusion-based methods in producing more realistic images and accurately reproducing garment patterns. Our code and models will be publicly released.

Via

Access Paper or Ask Questions

MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition

Nov 30, 2023

Dan Song, Xinwei Fu, Weizhi Nie, Wenhui Li, Anan Liu

Figure 1 for MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition

Figure 2 for MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition

Figure 3 for MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition

Figure 4 for MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition

Abstract:Large-scale pre-trained models have demonstrated impressive performance in vision and language tasks within open-world scenarios. Due to the lack of comparable pre-trained models for 3D shapes, recent methods utilize language-image pre-training to realize zero-shot 3D shape recognition. However, due to the modality gap, pretrained language-image models are not confident enough in the generalization to 3D shape recognition. Consequently, this paper aims to improve the confidence with view selection and hierarchical prompts. Leveraging the CLIP model as an example, we employ view selection on the vision side by identifying views with high prediction confidence from multiple rendered views of a 3D shape. On the textual side, the strategy of hierarchical prompts is proposed for the first time. The first layer prompts several classification candidates with traditional class-level descriptions, while the second layer refines the prediction based on function-level descriptions or further distinctions between the candidates. Remarkably, without the need for additional training, our proposed method achieves impressive zero-shot 3D classification accuracies of 84.44\%, 91.51\%, and 66.17\% on ModelNet40, ModelNet10, and ShapeNet Core55, respectively. Furthermore, we will make the code publicly available to facilitate reproducibility and further research in this area.

Via

Access Paper or Ask Questions

Image-Based Virtual Try-On: A Survey

Nov 08, 2023

Dan Song, Xuanpu Zhang, Juan Zhou, Weizhi Nie, Ruofeng Tong, An-An Liu

Abstract:Image-based virtual try-on aims to synthesize a naturally dressed person image with a clothing image, which revolutionizes online shopping and inspires related topics within image generation, showing both research significance and commercial potentials. However, there is a great gap between current research progress and commercial applications and an absence of comprehensive overview towards this field to accelerate the development. In this survey, we provide a comprehensive analysis of the state-of-the-art techniques and methodologies in aspects of pipeline architecture, person representation and key modules such as try-on indication, clothing warping and try-on stage. We propose a new semantic criteria with CLIP, and evaluate representative methods with uniformly implemented evaluation metrics on the same dataset. In addition to quantitative and qualitative evaluation of current open-source methods, we also utilize ControlNet to fine-tune a recent large image generation model (PBE) to show future potentials of large-scale models on image-based virtual try-on task. Finally, unresolved issues are revealed and future research directions are prospected to identify key trends and inspire further exploration. The uniformly implemented evaluation metrics, dataset and collected methods will be made public available at https://github.com/little-misfit/Survey-Of-Virtual-Try-On.

* 20 pages, 17 figures

Via

Access Paper or Ask Questions

Temporal-spatial Correlation Attention Network for Clinical Data Analysis in Intensive Care Unit

Jun 03, 2023

Weizhi Nie, Yuhe Yu, Chen Zhang, Dan Song, Lina Zhao, Yunpeng Bai

Figure 1 for Temporal-spatial Correlation Attention Network for Clinical Data Analysis in Intensive Care Unit

Figure 2 for Temporal-spatial Correlation Attention Network for Clinical Data Analysis in Intensive Care Unit

Figure 3 for Temporal-spatial Correlation Attention Network for Clinical Data Analysis in Intensive Care Unit

Figure 4 for Temporal-spatial Correlation Attention Network for Clinical Data Analysis in Intensive Care Unit

Abstract:In recent years, medical information technology has made it possible for electronic health record (EHR) to store fairly complete clinical data. This has brought health care into the era of "big data". However, medical data are often sparse and strongly correlated, which means that medical problems cannot be solved effectively. With the rapid development of deep learning in recent years, it has provided opportunities for the use of big data in healthcare. In this paper, we propose a temporal-saptial correlation attention network (TSCAN) to handle some clinical characteristic prediction problems, such as predicting death, predicting length of stay, detecting physiologic decline, and classifying phenotypes. Based on the design of the attention mechanism model, our approach can effectively remove irrelevant items in clinical data and irrelevant nodes in time according to different tasks, so as to obtain more accurate prediction results. Our method can also find key clinical indicators of important outcomes that can be used to improve treatment options. Our experiments use information from the Medical Information Mart for Intensive Care (MIMIC-IV) database, which is open to the public. Finally, we have achieved significant performance benefits of 2.0\% (metric) compared to other SOTA prediction methods. We achieved a staggering 90.7\% on mortality rate, 45.1\% on length of stay. The source code can be find: \url{https://github.com/yuyuheintju/TSCAN}.

Via

Access Paper or Ask Questions

Deep Reinforcement Learning Framework for Thoracic Diseases Classification via Prior Knowledge Guidance

Jun 02, 2023

Weizhi Nie, Chen Zhang, Dan Song, Lina Zhao, Yunpeng Bai, Keliang Xie, Anan Liu

Figure 1 for Deep Reinforcement Learning Framework for Thoracic Diseases Classification via Prior Knowledge Guidance

Figure 2 for Deep Reinforcement Learning Framework for Thoracic Diseases Classification via Prior Knowledge Guidance

Figure 3 for Deep Reinforcement Learning Framework for Thoracic Diseases Classification via Prior Knowledge Guidance

Figure 4 for Deep Reinforcement Learning Framework for Thoracic Diseases Classification via Prior Knowledge Guidance

Abstract:The chest X-ray is often utilized for diagnosing common thoracic diseases. In recent years, many approaches have been proposed to handle the problem of automatic diagnosis based on chest X-rays. However, the scarcity of labeled data for related diseases still poses a huge challenge to an accurate diagnosis. In this paper, we focus on the thorax disease diagnostic problem and propose a novel deep reinforcement learning framework, which introduces prior knowledge to direct the learning of diagnostic agents and the model parameters can also be continuously updated as the data increases, like a person's learning process. Especially, 1) prior knowledge can be learned from the pre-trained model based on old data or other domains' similar data, which can effectively reduce the dependence on target domain data, and 2) the framework of reinforcement learning can make the diagnostic agent as exploratory as a human being and improve the accuracy of diagnosis through continuous exploration. The method can also effectively solve the model learning problem in the case of few-shot data and improve the generalization ability of the model. Finally, our approach's performance was demonstrated using the well-known NIH ChestX-ray 14 and CheXpert datasets, and we achieved competitive results. The source code can be found here: \url{https://github.com/NeaseZ/MARL}.

Via

Access Paper or Ask Questions