Abstract:In recent years, multimodal multiobjective optimization algorithms (MMOAs) based on evolutionary computation have been widely studied. However, existing MMOAs are mainly tested on benchmark function sets such as the 2019 IEEE Congress on Evolutionary Computation test suite (CEC 2019), and their performance on real-world problems is neglected. In this paper, two types of real-world multimodal multiobjective optimization problems in feature selection and location selection respectively are formulated. Moreover, four real-world datasets of Guangzhou, China are constructed for location selection. An investigation is conducted to evaluate the performance of seven existing MMOAs in solving these two types of real-world problems. An analysis of the experimental results explores the characteristics of the tested MMOAs, providing insights for selecting suitable MMOAs in real-world applications.
Abstract:Ultrasound imaging is widely used in clinical diagnosis due to its non-invasive nature and real-time capabilities. However, conventional ultrasound diagnostics face several limitations, including high dependence on physician expertise and suboptimal image quality, which complicates interpretation and increases the likelihood of diagnostic errors. Artificial intelligence (AI) has emerged as a promising solution to enhance clinical diagnosis, particularly in detecting abnormalities across various biomedical imaging modalities. Nonetheless, current AI models for ultrasound imaging face critical challenges. First, these models often require large volumes of labeled medical data, raising concerns over patient privacy breaches. Second, most existing models are task-specific, which restricts their broader clinical utility. To overcome these challenges, we present UltraFedFM, an innovative privacy-preserving ultrasound foundation model. UltraFedFM is collaboratively pre-trained using federated learning across 16 distributed medical institutions in 9 countries, leveraging a dataset of over 1 million ultrasound images covering 19 organs and 10 ultrasound modalities. This extensive and diverse data, combined with a secure training framework, enables UltraFedFM to exhibit strong generalization and diagnostic capabilities. It achieves an average area under the receiver operating characteristic curve of 0.927 for disease diagnosis and a dice similarity coefficient of 0.878 for lesion segmentation. Notably, UltraFedFM surpasses the diagnostic accuracy of mid-level ultrasonographers and matches the performance of expert-level sonographers in the joint diagnosis of 8 common systemic diseases. These findings indicate that UltraFedFM can significantly enhance clinical diagnostics while safeguarding patient privacy, marking an advancement in AI-driven ultrasound imaging for future clinical applications.
Abstract:The success of current Entity Alignment (EA) task depends largely on the supervision information provided by labeled data. Considering the cost of labeled data, most supervised methods are difficult to apply in practical scenarios. Therefore, more and more works based on contrastive learning, active learning or other deep learning techniques have been developed, to solve the performance bottleneck caused by the lack of labeled data. However, the existing unsupervised EA methods still have some limitations, either their modeling complexity is high or they cannot balance the effectiveness and practicality of alignment. To overcome these issues, we propose a Simplifying and Learnable graph convolutional attention network for Unsupervised Knowledge Graphs alignment method (SLU). Specifically, we first introduce LCAT, a new and simple framework as the backbone network to model the graph structure of two KGs. Then we design a reconstruction method of relation structure based on potential matching relations for efficiently filtering invalid neighborhood information of aligned entities, to improve the usability and scalability of SLU. Impressively, a similarity function based on consistency is proposed to better measure the similarity of candidate entity pairs. Finally, we conduct extensive experiments on three datasets of different sizes (15K and 100K) and different types (cross-lingual and monolingual) to verify the superiority of SLU. Experimental results show that SLU significantly improves alignment accuracy, outperforming 25 supervised or unsupervised methods, and improving 6.4% in Hits@1 over the best baseline in the best case.
Abstract:Limited by the expensive labeling, polyp segmentation models are plagued by data shortages. To tackle this, we propose the mixed supervised polyp segmentation paradigm (MixPolyp). Unlike traditional models relying on a single type of annotation, MixPolyp combines diverse annotation types (mask, box, and scribble) within a single model, thereby expanding the range of available data and reducing labeling costs. To achieve this, MixPolyp introduces three novel supervision losses to handle various annotations: Subspace Projection loss (L_SP), Binary Minimum Entropy loss (L_BME), and Linear Regularization loss (L_LR). For box annotations, L_SP eliminates shape inconsistencies between the prediction and the supervision. For scribble annotations, L_BME provides supervision for unlabeled pixels through minimum entropy constraint, thereby alleviating supervision sparsity. Furthermore, L_LR provides dense supervision by enforcing consistency among the predictions, thus reducing the non-uniqueness. These losses are independent of the model structure, making them generally applicable. They are used only during training, adding no computational cost during inference. Extensive experiments on five datasets demonstrate MixPolyp's effectiveness.
Abstract:AI-assisted lesion detection models play a crucial role in the early screening of cancer. However, previous image-based models ignore the inter-frame contextual information present in videos. On the other hand, video-based models capture the inter-frame context but are computationally expensive. To mitigate this contradiction, we delve into Video-to-Image knowledge distillation leveraging DEtection TRansformer (V2I-DETR) for the task of medical video lesion detection. V2I-DETR adopts a teacher-student network paradigm. The teacher network aims at extracting temporal contexts from multiple frames and transferring them to the student network, and the student network is an image-based model dedicated to fast prediction in inference. By distilling multi-frame contexts into a single frame, the proposed V2I-DETR combines the advantages of utilizing temporal contexts from video-based models and the inference speed of image-based models. Through extensive experiments, V2I-DETR outperforms previous state-of-the-art methods by a large margin while achieving the real-time inference speed (30 FPS) as the image-based model.
Abstract:Endorectal ultrasound (ERUS) is an important imaging modality that provides high reliability for diagnosing the depth and boundary of invasion in colorectal cancer. However, the lack of a large-scale ERUS dataset with high-quality annotations hinders the development of automatic ultrasound diagnostics. In this paper, we collected and annotated the first benchmark dataset that covers diverse ERUS scenarios, i.e. colorectal cancer segmentation, detection, and infiltration depth staging. Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames. Based on this dataset, we further introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR). ASTR is designed based on three considerations: scanning mode discrepancy, temporal information, and low computational complexity. For generalizing to different scanning modes, the adaptive scanning-mode augmentation is proposed to convert between raw sector images and linear scan ones. For mining temporal information, the sparse-context transformer is incorporated to integrate inter-frame local and global features. For reducing computational complexity, the sparse-context block is introduced to extract contextual features from auxiliary frames. Finally, on the benchmark dataset, the proposed ASTR model achieves a 77.6% Dice score in rectal cancer segmentation, largely outperforming previous state-of-the-art methods.
Abstract:How to simultaneously locate multiple global peaks and achieve certain accuracy on the found peaks are two key challenges in solving multimodal optimization problems (MMOPs). In this paper, a landscape-aware differential evolution (LADE) algorithm is proposed for MMOPs, which utilizes landscape knowledge to maintain sufficient diversity and provide efficient search guidance. In detail, the landscape knowledge is efficiently utilized in the following three aspects. First, a landscape-aware peak exploration helps each individual evolve adaptively to locate a peak and simulates the regions of the found peaks according to search history to avoid an individual locating a found peak. Second, a landscape-aware peak distinction distinguishes whether an individual locates a new global peak, a new local peak, or a found peak. Accuracy refinement can thus only be conducted on the global peaks to enhance the search efficiency. Third, a landscape-aware reinitialization specifies the initial position of an individual adaptively according to the distribution of the found peaks, which helps explore more peaks. The experiments are conducted on 20 widely-used benchmark MMOPs. Experimental results show that LADE obtains generally better or competitive performance compared with seven well-performed algorithms proposed recently and four winner algorithms in the IEEE CEC competitions for multimodal optimization.
Abstract:In this paper, we propose an Openspace Collision-freE trAjectory plaNner (OCEAN) for autonomous parking. OCEAN is an optimization-based trajectory planner accelerated by Alternating Direction Method of Multiplier (ADMM) with enhanced computational efficiency and robustness, and is suitable for all scenes with few dynamic obstacles. Starting from a hierarchical optimization-based collision avoidance framework, the trajectory planning problem is first warm-started by a collision-free Hybrid A* trajectory, then the collision avoidance trajectory planning problem is reformulated as a smooth and convex dual form, and solved by ADMM in parallel. The optimization variables are carefully split into several groups so that ADMM sub-problems are formulated as Quadratic Programming (QP), Sequential Quadratic Programming (SQP),and Second Order Cone Programming (SOCP) problems that can be efficiently and robustly solved. We validate our method both in hundreds of simulation scenarios and hundreds of hours of public parking areas. The results show that the proposed method has better system performance compared with other benchmarks.
Abstract:Accurate polyp detection is critical for early colorectal cancer diagnosis. Although remarkable progress has been achieved in recent years, the complex colon environment and concealed polyps with unclear boundaries still pose severe challenges in this area. Existing methods either involve computationally expensive context aggregation or lack prior modeling of polyps, resulting in poor performance in challenging cases. In this paper, we propose the Enhanced CenterNet with Contrastive Learning (ECC-PolypDet), a two-stage training \& end-to-end inference framework that leverages images and bounding box annotations to train a general model and fine-tune it based on the inference score to obtain a final robust model. Specifically, we conduct Box-assisted Contrastive Learning (BCL) during training to minimize the intra-class difference and maximize the inter-class difference between foreground polyps and backgrounds, enabling our model to capture concealed polyps. Moreover, to enhance the recognition of small polyps, we design the Semantic Flow-guided Feature Pyramid Network (SFFPN) to aggregate multi-scale features and the Heatmap Propagation (HP) module to boost the model's attention on polyp targets. In the fine-tuning stage, we introduce the IoU-guided Sample Re-weighting (ISR) mechanism to prioritize hard samples by adaptively adjusting the loss weight for each sample during fine-tuning. Extensive experiments on six large-scale colonoscopy datasets demonstrate the superiority of our model compared with previous state-of-the-art detectors.
Abstract:Automatic polyp segmentation models play a pivotal role in the clinical diagnosis of gastrointestinal diseases. In previous studies, most methods relied on fully supervised approaches, necessitating pixel-level annotations for model training. However, the creation of pixel-level annotations is both expensive and time-consuming, impeding the development of model generalization. In response to this challenge, we introduce ScribblePolyp, a novel scribble-supervised polyp segmentation framework. Unlike fully-supervised models, ScribblePolyp only requires the annotation of two lines (scribble labels) for each image, significantly reducing the labeling cost. Despite the coarse nature of scribble labels, which leave a substantial portion of pixels unlabeled, we propose a two-branch consistency alignment approach to provide supervision for these unlabeled pixels. The first branch employs transformation consistency alignment to narrow the gap between predictions under different transformations of the same input image. The second branch leverages affinity propagation to refine predictions into a soft version, extending additional supervision to unlabeled pixels. In summary, ScribblePolyp is an efficient model that does not rely on teacher models or moving average pseudo labels during training. Extensive experiments on the SUN-SEG dataset underscore the effectiveness of ScribblePolyp, achieving a Dice score of 0.8155, with the potential for a 1.8% improvement in the Dice score through a straightforward self-training strategy.