Abstract:This paper presents our contributions towards advancing the state of Vietnamese language understanding and generation through the development and dissemination of open datasets and pre-trained models for Vietnamese Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs).
Abstract:Gastrointestinal endoscopy is a medical procedure that utilizes a flexible tube equipped with a camera and other instruments to examine the digestive tract. This minimally invasive technique allows for diagnosing and managing various gastrointestinal conditions, including inflammatory bowel disease, gastrointestinal bleeding, and colon cancer. The early detection and identification of lesions in the upper gastrointestinal tract and the identification of malignant polyps that may pose a risk of cancer development are critical components of gastrointestinal endoscopy's diagnostic and therapeutic applications. Therefore, enhancing the detection rates of gastrointestinal disorders can significantly improve a patient's prognosis by increasing the likelihood of timely medical intervention, which may prolong the patient's lifespan and improve overall health outcomes. This paper presents a novel Transformer-based deep neural network designed to perform multiple tasks simultaneously, thereby enabling accurate identification of both upper gastrointestinal tract lesions and colon polyps. Our approach proposes a unique global context-aware module and leverages the powerful MiT backbone, along with a feature alignment block, to enhance the network's representation capability. This novel design leads to a significant improvement in performance across various endoscopic diagnosis tasks. Extensive experiments demonstrate the superior performance of our method compared to other state-of-the-art approaches.
Abstract:Automatic and accurate segmentation of colon polyps is essential for early diagnosis of colorectal cancer. Advanced deep learning models have shown promising results in polyp segmentation. However, they still have limitations in representing multi-scale features and generalization capability. To address these issues, this paper introduces RaBiT, an encoder-decoder model that incorporates a lightweight Transformer-based architecture in the encoder to model multiple-level global semantic relationships. The decoder consists of several bidirectional feature pyramid layers with reverse attention modules to better fuse feature maps at various levels and incrementally refine polyp boundaries. We also propose ideas to lighten the reverse attention module and make it more suitable for multi-class segmentation. Extensive experiments on several benchmark datasets show that our method outperforms existing methods across all datasets while maintaining low computational complexity. Moreover, our method demonstrates high generalization capability in cross-dataset experiments, even when the training and test sets have different characteristics.
Abstract:Semantic segmentation is an essential task in developing medical image diagnosis systems. However, building an annotated medical dataset is expensive. Thus, semi-supervised methods are significant in this circumstance. In semi-supervised learning, the quality of labels plays a crucial role in model performance. In this work, we present a new pseudo labeling strategy that enhances the quality of pseudo labels used for training student networks. We follow the multi-stage semi-supervised training approach, which trains a teacher model on a labeled dataset and then uses the trained teacher to render pseudo labels for student training. By doing so, the pseudo labels will be updated and more precise as training progress. The key difference between previous and our methods is that we update the teacher model during the student training process. So the quality of pseudo labels is improved during the student training process. We also propose a simple but effective strategy to enhance the quality of pseudo labels using a momentum model -- a slow copy version of the original model during training. By applying the momentum model combined with re-rendering pseudo labels during student training, we achieved an average of 84.1% Dice Score on five datasets (i.e., Kvarsir, CVC-ClinicDB, ETIS-LaribPolypDB, CVC-ColonDB, and CVC-300) with only 20% of the dataset used as labeled data. Our results surpass common practice by 3% and even approach fully-supervised results on some datasets. Our source code and pre-trained models are available at https://github.com/sun-asterisk-research/online learning ssl
Abstract:Identifying polyps is a challenging problem for automatic analysis of endoscopic images in computer-aided clinical support systems. Models based on convolutional networks (CNN), transformers, and combinations of them have been proposed to segment polyps with promising results. However, those approaches have limitations either in modeling the local appearance of the polyps only or lack of multi-level features for spatial dependency in the decoding process. This paper proposes a novel network, namely ColonFormer, to address these limitations. ColonFormer is an encoder-decoder architecture with the capability of modeling long-range semantic information at both encoder and decoder branches. The encoder is a lightweight architecture based on transformers for modeling global semantic relations at multi scales. The decoder is a hierarchical network structure designed for learning multi-level features to enrich feature representation. Besides, a refinement module is added with a new skip connection technique to refine the boundary of polyp objects in the global map for accurate segmentation. Extensive experiments have been conducted on five popular benchmark datasets for polyp segmentation, including Kvasir, CVC-Clinic DB, CVCColonDB, EndoScene, and ETIS. Experimental results show that our ColonFormer achieve state-of-the-art performance on all benchmark datasets.
Abstract:In recent years, computer-aided automatic polyp segmentation and neoplasm detection have been an emerging topic in medical image analysis, providing valuable support to colonoscopy procedures. Attentions have been paid to improving the accuracy of polyp detection and segmentation. However, not much focus has been given to latency and throughput for performing these tasks on dedicated devices, which can be crucial for practical applications. This paper introduces a novel deep neural network architecture called BlazeNeo, for the task of polyp segmentation and neoplasm detection with an emphasis on compactness and speed while maintaining high accuracy. The model leverages the highly efficient HarDNet backbone alongside lightweight Receptive Field Blocks for computational efficiency, and an auxiliary training mechanism to take full advantage of the training data for the segmentation quality. Our experiments on a challenging dataset show that BlazeNeo achieves improvements in latency and model size while maintaining comparable accuracy against state-of-the-art methods. When deploying on the Jetson AGX Xavier edge device in INT8 precision, our BlazeNeo achieves over 155 fps while yielding the best accuracy among all compared methods.
Abstract:Automatic polyp segmentation has proven to be immensely helpful for endoscopy procedures, reducing the missing rate of adenoma detection for endoscopists while increasing efficiency. However, classifying a polyp as being neoplasm or not and segmenting it at the pixel level is still a challenging task for doctors to perform in a limited time. In this work, we propose a fine-grained formulation for the polyp segmentation problem. Our formulation aims to not only segment polyp regions, but also identify those at high risk of malignancy with high accuracy. In addition, we present a UNet-based neural network architecture called NeoUNet, along with a hybrid loss function to solve this problem. Experiments show highly competitive results for NeoUNet on our benchmark dataset compared to existing polyp segmentation models.
Abstract:Colorectal cancer is among the most common malignancies and can develop from high-risk colon polyps. Colonoscopy is an effective screening tool to detect and remove polyps, especially in the case of precancerous lesions. However, the missing rate in clinical practice is relatively high due to many factors. The procedure could benefit greatly from using AI models for automatic polyp segmentation, which provide valuable insights for improving colon polyp detection. However, precise segmentation is still challenging due to variations of polyps in size, shape, texture, and color. This paper proposes a novel neural network architecture called AG-CUResNeSt, which enhances Coupled UNets using the robust ResNeSt backbone and attention gates. The network is capable of effectively combining multi-level features to yield accurate polyp segmentation. Experimental results on five popular benchmark datasets show that our proposed method achieves state-of-the-art accuracy compared to existing methods.