Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuncheng Jiang

Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis

Apr 16, 2025

Miaosen Luo, Yuncheng Jiang, Sijie Mai

Abstract:Multimodal Sentiment Analysis (MSA) faces two critical challenges: the lack of interpretability in the decision logic of multimodal fusion and modality imbalance caused by disparities in inter-modal information density. To address these issues, we propose KAN-MCP, a novel framework that integrates the interpretability of Kolmogorov-Arnold Networks (KAN) with the robustness of the Multimodal Clean Pareto (MCPareto) framework. First, KAN leverages its univariate function decomposition to achieve transparent analysis of cross-modal interactions. This structural design allows direct inspection of feature transformations without relying on external interpretation tools, thereby ensuring both high expressiveness and interpretability. Second, the proposed MCPareto enhances robustness by addressing modality imbalance and noise interference. Specifically, we introduce the Dimensionality Reduction and Denoising Modal Information Bottleneck (DRD-MIB) method, which jointly denoises and reduces feature dimensionality. This approach provides KAN with discriminative low-dimensional inputs to reduce the modeling complexity of KAN while preserving critical sentiment-related information. Furthermore, MCPareto dynamically balances gradient contributions across modalities using the purified features output by DRD-MIB, ensuring lossless transmission of auxiliary signals and effectively alleviating modality imbalance. This synergy of interpretability and robustness not only achieves superior performance on benchmark datasets such as CMU-MOSI, CMU-MOSEI, and CH-SIMS v2 but also offers an intuitive visualization interface through KAN's interpretable architecture.

Via

Access Paper or Ask Questions

A Performance Investigation of Multimodal Multiobjective Optimization Algorithms in Solving Two Types of Real-World Problems

Dec 04, 2024

Zhiqiu Chen, Zong-Gan Chen, Yuncheng Jiang, Zhi-Hui Zhan

Figure 1 for A Performance Investigation of Multimodal Multiobjective Optimization Algorithms in Solving Two Types of Real-World Problems

Figure 2 for A Performance Investigation of Multimodal Multiobjective Optimization Algorithms in Solving Two Types of Real-World Problems

Figure 3 for A Performance Investigation of Multimodal Multiobjective Optimization Algorithms in Solving Two Types of Real-World Problems

Figure 4 for A Performance Investigation of Multimodal Multiobjective Optimization Algorithms in Solving Two Types of Real-World Problems

Abstract:In recent years, multimodal multiobjective optimization algorithms (MMOAs) based on evolutionary computation have been widely studied. However, existing MMOAs are mainly tested on benchmark function sets such as the 2019 IEEE Congress on Evolutionary Computation test suite (CEC 2019), and their performance on real-world problems is neglected. In this paper, two types of real-world multimodal multiobjective optimization problems in feature selection and location selection respectively are formulated. Moreover, four real-world datasets of Guangzhou, China are constructed for location selection. An investigation is conducted to evaluate the performance of seven existing MMOAs in solving these two types of real-world problems. An analysis of the experimental results explores the characteristics of the tested MMOAs, providing insights for selecting suitable MMOAs in real-world applications.

* the 2024 International Annual Conference on Complex Systems and Intelligent Science,6 pages

Via

Access Paper or Ask Questions

Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence

Nov 25, 2024

Yuncheng Jiang, Chun-Mei Feng, Jinke Ren, Jun Wei, Zixun Zhang, Yiwen Hu, Yunbi Liu, Rui Sun, Xuemei Tang, Juan Du(+10 more)

Figure 1 for Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence

Figure 2 for Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence

Figure 3 for Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence

Figure 4 for Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence

Abstract:Ultrasound imaging is widely used in clinical diagnosis due to its non-invasive nature and real-time capabilities. However, conventional ultrasound diagnostics face several limitations, including high dependence on physician expertise and suboptimal image quality, which complicates interpretation and increases the likelihood of diagnostic errors. Artificial intelligence (AI) has emerged as a promising solution to enhance clinical diagnosis, particularly in detecting abnormalities across various biomedical imaging modalities. Nonetheless, current AI models for ultrasound imaging face critical challenges. First, these models often require large volumes of labeled medical data, raising concerns over patient privacy breaches. Second, most existing models are task-specific, which restricts their broader clinical utility. To overcome these challenges, we present UltraFedFM, an innovative privacy-preserving ultrasound foundation model. UltraFedFM is collaboratively pre-trained using federated learning across 16 distributed medical institutions in 9 countries, leveraging a dataset of over 1 million ultrasound images covering 19 organs and 10 ultrasound modalities. This extensive and diverse data, combined with a secure training framework, enables UltraFedFM to exhibit strong generalization and diagnostic capabilities. It achieves an average area under the receiver operating characteristic curve of 0.927 for disease diagnosis and a dice similarity coefficient of 0.878 for lesion segmentation. Notably, UltraFedFM surpasses the diagnostic accuracy of mid-level ultrasonographers and matches the performance of expert-level sonographers in the joint diagnosis of 8 common systemic diseases. These findings indicate that UltraFedFM can significantly enhance clinical diagnostics while safeguarding patient privacy, marking an advancement in AI-driven ultrasound imaging for future clinical applications.

Via

Access Paper or Ask Questions

A Simplifying and Learnable Graph Convolutional Attention Network for Unsupervised Knowledge Graphs Alignment

Oct 17, 2024

Weishan Cai, Wenjun Ma, Yuncheng Jiang

Figure 1 for A Simplifying and Learnable Graph Convolutional Attention Network for Unsupervised Knowledge Graphs Alignment

Figure 2 for A Simplifying and Learnable Graph Convolutional Attention Network for Unsupervised Knowledge Graphs Alignment

Figure 3 for A Simplifying and Learnable Graph Convolutional Attention Network for Unsupervised Knowledge Graphs Alignment

Figure 4 for A Simplifying and Learnable Graph Convolutional Attention Network for Unsupervised Knowledge Graphs Alignment

Abstract:The success of current Entity Alignment (EA) task depends largely on the supervision information provided by labeled data. Considering the cost of labeled data, most supervised methods are difficult to apply in practical scenarios. Therefore, more and more works based on contrastive learning, active learning or other deep learning techniques have been developed, to solve the performance bottleneck caused by the lack of labeled data. However, the existing unsupervised EA methods still have some limitations, either their modeling complexity is high or they cannot balance the effectiveness and practicality of alignment. To overcome these issues, we propose a Simplifying and Learnable graph convolutional attention network for Unsupervised Knowledge Graphs alignment method (SLU). Specifically, we first introduce LCAT, a new and simple framework as the backbone network to model the graph structure of two KGs. Then we design a reconstruction method of relation structure based on potential matching relations for efficiently filtering invalid neighborhood information of aligned entities, to improve the usability and scalability of SLU. Impressively, a similarity function based on consistency is proposed to better measure the similarity of candidate entity pairs. Finally, we conduct extensive experiments on three datasets of different sizes (15K and 100K) and different types (cross-lingual and monolingual) to verify the superiority of SLU. Experimental results show that SLU significantly improves alignment accuracy, outperforming 25 supervised or unsupervised methods, and improving 6.4% in Hits@1 over the best baseline in the best case.

* 14 pages, 3 figures

Via

Access Paper or Ask Questions

MixPolyp: Integrating Mask, Box and Scribble Supervision for Enhanced Polyp Segmentation

Sep 25, 2024

Yiwen Hu, Jun Wei, Yuncheng Jiang, Haoyang Li, Shuguang Cui, Zhen Li, Song Wu

Figure 1 for MixPolyp: Integrating Mask, Box and Scribble Supervision for Enhanced Polyp Segmentation

Figure 2 for MixPolyp: Integrating Mask, Box and Scribble Supervision for Enhanced Polyp Segmentation

Figure 3 for MixPolyp: Integrating Mask, Box and Scribble Supervision for Enhanced Polyp Segmentation

Figure 4 for MixPolyp: Integrating Mask, Box and Scribble Supervision for Enhanced Polyp Segmentation

Abstract:Limited by the expensive labeling, polyp segmentation models are plagued by data shortages. To tackle this, we propose the mixed supervised polyp segmentation paradigm (MixPolyp). Unlike traditional models relying on a single type of annotation, MixPolyp combines diverse annotation types (mask, box, and scribble) within a single model, thereby expanding the range of available data and reducing labeling costs. To achieve this, MixPolyp introduces three novel supervision losses to handle various annotations: Subspace Projection loss (L_SP), Binary Minimum Entropy loss (L_BME), and Linear Regularization loss (L_LR). For box annotations, L_SP eliminates shape inconsistencies between the prediction and the supervision. For scribble annotations, L_BME provides supervision for unlabeled pixels through minimum entropy constraint, thereby alleviating supervision sparsity. Furthermore, L_LR provides dense supervision by enforcing consistency among the predictions, thus reducing the non-uniqueness. These losses are independent of the model structure, making them generally applicable. They are used only during training, adding no computational cost during inference. Extensive experiments on five datasets demonstrate MixPolyp's effectiveness.

* Accepted in IEEE BIBM 2024

Via

Access Paper or Ask Questions

Let Video Teaches You More: Video-to-Image Knowledge Distillation using DEtection TRansformer for Medical Video Lesion Detection

Aug 26, 2024

Yuncheng Jiang, Zixun Zhang, Jun Wei, Chun-Mei Feng, Guanbin Li, Xiang Wan, Shuguang Cui, Zhen Li

Abstract:AI-assisted lesion detection models play a crucial role in the early screening of cancer. However, previous image-based models ignore the inter-frame contextual information present in videos. On the other hand, video-based models capture the inter-frame context but are computationally expensive. To mitigate this contradiction, we delve into Video-to-Image knowledge distillation leveraging DEtection TRansformer (V2I-DETR) for the task of medical video lesion detection. V2I-DETR adopts a teacher-student network paradigm. The teacher network aims at extracting temporal contexts from multiple frames and transferring them to the student network, and the student network is an image-based model dedicated to fast prediction in inference. By distilling multi-frame contexts into a single frame, the proposed V2I-DETR combines the advantages of utilizing temporal contexts from video-based models and the inference speed of image-based models. Through extensive experiments, V2I-DETR outperforms previous state-of-the-art methods by a large margin while achieving the real-time inference speed (30 FPS) as the image-based model.

* BIBM2024

Via

Access Paper or Ask Questions

Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development

Aug 19, 2024

Yuncheng Jiang, Yiwen Hu, Zixun Zhang, Jun Wei, Chun-Mei Feng, Xuemei Tang, Xiang Wan, Yong Liu, Shuguang Cui, Zhen Li

Abstract:Endorectal ultrasound (ERUS) is an important imaging modality that provides high reliability for diagnosing the depth and boundary of invasion in colorectal cancer. However, the lack of a large-scale ERUS dataset with high-quality annotations hinders the development of automatic ultrasound diagnostics. In this paper, we collected and annotated the first benchmark dataset that covers diverse ERUS scenarios, i.e. colorectal cancer segmentation, detection, and infiltration depth staging. Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames. Based on this dataset, we further introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR). ASTR is designed based on three considerations: scanning mode discrepancy, temporal information, and low computational complexity. For generalizing to different scanning modes, the adaptive scanning-mode augmentation is proposed to convert between raw sector images and linear scan ones. For mining temporal information, the sparse-context transformer is incorporated to integrate inter-frame local and global features. For reducing computational complexity, the sparse-context block is introduced to extract contextual features from auxiliary frames. Finally, on the benchmark dataset, the proposed ASTR model achieves a 77.6% Dice score in rectal cancer segmentation, largely outperforming previous state-of-the-art methods.

Via

Access Paper or Ask Questions

A Landscape-Aware Differential Evolution for Multimodal Optimization Problems

Aug 05, 2024

Guo-Yun Lin, Zong-Gan Chen, Yuncheng Jiang, Zhi-Hui Zhan, Jun Zhang

Figure 1 for A Landscape-Aware Differential Evolution for Multimodal Optimization Problems

Figure 2 for A Landscape-Aware Differential Evolution for Multimodal Optimization Problems

Figure 3 for A Landscape-Aware Differential Evolution for Multimodal Optimization Problems

Figure 4 for A Landscape-Aware Differential Evolution for Multimodal Optimization Problems

Abstract:How to simultaneously locate multiple global peaks and achieve certain accuracy on the found peaks are two key challenges in solving multimodal optimization problems (MMOPs). In this paper, a landscape-aware differential evolution (LADE) algorithm is proposed for MMOPs, which utilizes landscape knowledge to maintain sufficient diversity and provide efficient search guidance. In detail, the landscape knowledge is efficiently utilized in the following three aspects. First, a landscape-aware peak exploration helps each individual evolve adaptively to locate a peak and simulates the regions of the found peaks according to search history to avoid an individual locating a found peak. Second, a landscape-aware peak distinction distinguishes whether an individual locates a new global peak, a new local peak, or a found peak. Accuracy refinement can thus only be conducted on the global peaks to enhance the search efficiency. Third, a landscape-aware reinitialization specifies the initial position of an individual adaptively according to the distribution of the found peaks, which helps explore more peaks. The experiments are conducted on 20 widely-used benchmark MMOPs. Experimental results show that LADE obtains generally better or competitive performance compared with seven well-performed algorithms proposed recently and four winner algorithms in the IEEE CEC competitions for multimodal optimization.

* under review

Via

Access Paper or Ask Questions

OCEAN: An Openspace Collision-free Trajectory Planner for Autonomous Parking Based on ADMM

Mar 08, 2024

Dongxu Wang, Yanbin Lu, Weilong Liu, Hao Zuo, Jiade Xin, Xiang Long, Yuncheng Jiang

Figure 1 for OCEAN: An Openspace Collision-free Trajectory Planner for Autonomous Parking Based on ADMM

Figure 2 for OCEAN: An Openspace Collision-free Trajectory Planner for Autonomous Parking Based on ADMM

Figure 3 for OCEAN: An Openspace Collision-free Trajectory Planner for Autonomous Parking Based on ADMM

Figure 4 for OCEAN: An Openspace Collision-free Trajectory Planner for Autonomous Parking Based on ADMM

Abstract:In this paper, we propose an Openspace Collision-freE trAjectory plaNner (OCEAN) for autonomous parking. OCEAN is an optimization-based trajectory planner accelerated by Alternating Direction Method of Multiplier (ADMM) with enhanced computational efficiency and robustness, and is suitable for all scenes with few dynamic obstacles. Starting from a hierarchical optimization-based collision avoidance framework, the trajectory planning problem is first warm-started by a collision-free Hybrid A* trajectory, then the collision avoidance trajectory planning problem is reformulated as a smooth and convex dual form, and solved by ADMM in parallel. The optimization variables are carefully split into several groups so that ADMM sub-problems are formulated as Quadratic Programming (QP), Sequential Quadratic Programming (SQP),and Second Order Cone Programming (SOCP) problems that can be efficiently and robustly solved. We validate our method both in hundreds of simulation scenarios and hundreds of hours of public parking areas. The results show that the proposed method has better system performance compared with other benchmarks.

* 8 pages,5 figures

Via

Access Paper or Ask Questions

ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic Polyp Detection

Jan 10, 2024

Yuncheng Jiang, Zixun Zhang, Yiwen Hu, Guanbin Li, Xiang Wan, Song Wu, Shuguang Cui, Silin Huang, Zhen Li

Figure 1 for ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic Polyp Detection

Figure 2 for ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic Polyp Detection

Figure 3 for ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic Polyp Detection

Figure 4 for ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic Polyp Detection

Abstract:Accurate polyp detection is critical for early colorectal cancer diagnosis. Although remarkable progress has been achieved in recent years, the complex colon environment and concealed polyps with unclear boundaries still pose severe challenges in this area. Existing methods either involve computationally expensive context aggregation or lack prior modeling of polyps, resulting in poor performance in challenging cases. In this paper, we propose the Enhanced CenterNet with Contrastive Learning (ECC-PolypDet), a two-stage training \& end-to-end inference framework that leverages images and bounding box annotations to train a general model and fine-tune it based on the inference score to obtain a final robust model. Specifically, we conduct Box-assisted Contrastive Learning (BCL) during training to minimize the intra-class difference and maximize the inter-class difference between foreground polyps and backgrounds, enabling our model to capture concealed polyps. Moreover, to enhance the recognition of small polyps, we design the Semantic Flow-guided Feature Pyramid Network (SFFPN) to aggregate multi-scale features and the Heatmap Propagation (HP) module to boost the model's attention on polyp targets. In the fine-tuning stage, we introduce the IoU-guided Sample Re-weighting (ISR) mechanism to prioritize hard samples by adaptively adjusting the loss weight for each sample during fine-tuning. Extensive experiments on six large-scale colonoscopy datasets demonstrate the superiority of our model compared with previous state-of-the-art detectors.

* codes available at https://github.com/yuncheng97/ECC-PolypDet/tree/main

Via

Access Paper or Ask Questions