Abstract:Automated radiology report generation aims to expedite the tedious and error-prone reporting process for radiologists. While recent works have made progress, learning to align medical images and textual findings remains challenging due to the relative scarcity of labeled medical data. For example, datasets for this task are much smaller than those used for image captioning in computer vision. In this work, we propose to transfer representations from CLIP, a large-scale pre-trained vision-language model, to better capture cross-modal semantics between images and texts. However, directly applying CLIP is suboptimal due to the domain gap between natural images and radiology. To enable efficient adaptation, we introduce UniCrossAdapter, lightweight adapter modules that are incorporated into CLIP and fine-tuned on the target task while keeping base parameters fixed. The adapters are distributed across modalities and their interaction to enhance vision-language alignment. Experiments on two public datasets demonstrate the effectiveness of our approach, advancing state-of-the-art in radiology report generation. The proposed transfer learning framework provides a means of harnessing semantic knowledge from large-scale pre-trained models to tackle data-scarce medical vision-language tasks. Code is available at https://github.com/chauncey-tow/MRG-CLIP.
Abstract:Few-shot video object segmentation aims to reduce annotation costs; however, existing methods still require abundant dense frame annotations for training, which are scarce in the medical domain. We investigate an extremely low-data regime that utilizes annotations from only a few video frames and leverages existing labeled images to minimize costly video annotations. Specifically, we propose a two-phase framework. First, we learn a few-shot segmentation model using labeled images. Subsequently, to improve performance without full supervision, we introduce a spatiotemporal consistency relearning approach on medical videos that enforces consistency between consecutive frames. Constraints are also enforced between the image model and relearning model at both feature and prediction levels. Experiments demonstrate the superiority of our approach over state-of-the-art few-shot segmentation methods. Our model bridges the gap between abundant annotated medical images and scarce, sparsely labeled medical videos to achieve strong video segmentation performance in this low data regime. Code is available at https://github.com/MedAITech/RAB.
Abstract:Video object segmentation is crucial for the efficient analysis of complex medical video data, yet it faces significant challenges in data availability and annotation. We introduce the task of one-shot medical video object segmentation, which requires separating foreground and background pixels throughout a video given only the mask annotation of the first frame. To address this problem, we propose a temporal contrastive memory network comprising image and mask encoders to learn feature representations, a temporal contrastive memory bank that aligns embeddings from adjacent frames while pushing apart distant ones to explicitly model inter-frame relationships and stores these features, and a decoder that fuses encoded image features and memory readouts for segmentation. We also collect a diverse, multi-source medical video dataset spanning various modalities and anatomies to benchmark this task. Extensive experiments demonstrate state-of-the-art performance in segmenting both seen and unseen structures from a single exemplar, showing ability to generalize from scarce labels. This highlights the potential to alleviate annotation burdens for medical video analysis. Code is available at https://github.com/MedAITech/TCMN.
Abstract:Unsupervised anomaly detection using deep learning has garnered significant research attention due to its broad applicability, particularly in medical imaging where labeled anomalous data are scarce. While earlier approaches leverage generative models like autoencoders and generative adversarial networks (GANs), they often fall short due to overgeneralization. Recent methods explore various strategies, including memory banks, normalizing flows, self-supervised learning, and knowledge distillation, to enhance discrimination. Among these, knowledge distillation, particularly reverse distillation, has shown promise. Following this paradigm, we propose a novel scale-aware contrastive reverse distillation model that addresses two key limitations of existing reverse distillation methods: insufficient feature discriminability and inability to handle anomaly scale variations. Specifically, we introduce a contrastive student-teacher learning approach to derive more discriminative representations by generating and exploring out-of-normal distributions. Further, we design a scale adaptation mechanism to softly weight contrastive distillation losses at different scales to account for the scale variation issue. Extensive experiments on benchmark datasets demonstrate state-of-the-art performance, validating the efficacy of the proposed method. Code is available at https://github.com/MedAITech/SCRD4AD.
Abstract:Cell counting in microscopy images is vital in medicine and biology but extremely tedious and time-consuming to perform manually. While automated methods have advanced in recent years, state-of-the-art approaches tend to increasingly complex model designs. In this paper, we propose a conceptually simple yet effective decoupled learning scheme for automated cell counting, consisting of separate counter and localizer networks. In contrast to jointly learning counting and density map estimation, we show that decoupling these objectives surprisingly improves results. The counter operates on intermediate feature maps rather than pixel space to leverage global context and produce count estimates, while also generating coarse density maps. The localizer then reconstructs high-resolution density maps that precisely localize individual cells, conditional on the original images and coarse density maps from the counter. Besides, to boost counting accuracy, we further introduce a global message passing module to integrate cross-region patterns. Extensive experiments on four datasets demonstrate that our approach, despite its simplicity, challenges common practice and achieves state-of-the-art performance by significant margins. Our key insight is that decoupled learning alleviates the need to learn counting on high-resolution density maps directly, allowing the model to focus on global features critical for accurate estimates. Code is available at https://github.com/MedAITech/DCL.
Abstract:Unsupervised anomaly detection using only normal samples is of great significance for quality inspection in industrial manufacturing. Although existing reconstruction-based methods have achieved promising results, they still face two problems: poor distinguishable information in image reconstruction and well abnormal regeneration caused by model over-generalization ability. To overcome the above issues, we convert the image reconstruction into a combination of parallel feature restorations and propose a multi-feature reconstruction network, MFRNet, using crossed-mask restoration in this paper. Specifically, a multi-scale feature aggregator is first developed to generate more discriminative hierarchical representations of the input images from a pre-trained model. Subsequently, a crossed-mask generator is adopted to randomly cover the extracted feature map, followed by a restoration network based on the transformer structure for high-quality repair of the missing regions. Finally, a hybrid loss is equipped to guide model training and anomaly estimation, which gives consideration to both the pixel and structural similarity. Extensive experiments show that our method is highly competitive with or significantly outperforms other state-of-the-arts on four public available datasets and one self-made dataset.
Abstract:This paper studies enhanced dense code multiple access (DCMA) system design for downlink transmission over the Nakagami-$m$ fading channels. By studying the DCMA pairwise error probability (PEP) in a Nakagami-$m$ channel, a novel design metric called minimum logarithmic sum distance (MLSD) is first derived. With respect to the proposed MLSD, we introduce a new family of power-imbalanced dense codebooks by deleting certain rows of a special non-unimodular circulant matrix. Simulation results demonstrate that our proposed dense codebooks lead to both larger minimum Euclidean distance and MLSD, thus yielding significant improvements of error performance over the existing sparse code multiple access and conventional unimodular DCMA schemes in Nakagami-$m$ fading channels under different overloading factors.
Abstract:Owing to the more significant set size properties as compared to the set of complete complementary codes (CCCs), quasi-complementary code sets (QCCSs) are more convenient to support a large number of users in multicarrier code-division multiple-access (MC-CDMA) system over CCCs. Besides set size, it is also desirable to have a low maximum aperiodic correlation magnitude and small alphabet size. This paper aims to construct asymptotically optimal and near-optimal aperiodic QCCSs having a small alphabet size and low maximum correlation magnitude. Using multivariate functions and its associated graph, we propose a family of QCCSs consisting of multiple sets of CCCs and determine the parameters of the proposed QCCSs. Unlike the existing constructions of aperiodic QCCSs, the proposed construction can maintain a small alphabet size irrespective of the increasing sequence length and large set size.
Abstract:In order to accurately detect defects in patterned fabric images, a novel detection algorithm based on Gabor-HOG (GHOG) and low-rank decomposition is proposed in this paper. Defect-free pattern fabric images have the specified direction, while defects damage their regularity of direction. Therefore, a direction-aware descriptor is designed, denoted as GHOG, a combination of Gabor and HOG, which is extremely valuable for localizing the defect region. Upon devising a powerful directional descriptor, an efficient low-rank decomposition model is constructed to divide the matrix generated by the directional feature extracted from image blocks into a low-rank matrix (background information) and a sparse matrix (defect information). A nonconvex log det(.) as a smooth surrogate function for the rank instead of the nuclear norm is also exploited to improve the efficiency of the low-rank model. Moreover, the computational efficiency is further improved by utilizing the alternative direction method of multipliers (ADMM). Thereafter, the saliency map generated by the sparse matrix is segmented via the optimal threshold algorithm to locate the defect regions. Experimental results show that the proposed method can effectively detect patterned fabric defects and outperform the state-of-the-art methods.