Abstract:Cardiac auscultation, an integral tool in diagnosing cardiovascular diseases (CVDs), often relies on the subjective interpretation of clinicians, presenting a limitation in consistency and accuracy. Addressing this, we introduce the BUET Multi-disease Heart Sound (BMD-HS) dataset - a comprehensive and meticulously curated collection of heart sound recordings. This dataset, encompassing 864 recordings across five distinct classes of common heart sounds, represents a broad spectrum of valvular heart diseases, with a focus on diagnostically challenging cases. The standout feature of the BMD-HS dataset is its innovative multi-label annotation system, which captures a diverse range of diseases and unique disease states. This system significantly enhances the dataset's utility for developing advanced machine learning models in automated heart sound classification and diagnosis. By bridging the gap between traditional auscultation practices and contemporary data-driven diagnostic methods, the BMD-HS dataset is poised to revolutionize CVD diagnosis and management, providing an invaluable resource for the advancement of cardiac health research. The dataset is publicly available at this link: https://github.com/mHealthBuet/BMD-HS-Dataset.
Abstract:Despite the advancement of deep learning-based computer-aided diagnosis (CAD) methods for pneumonia from adult chest x-ray (CXR) images, the performance of CAD methods applied to pediatric images remains suboptimal, mainly due to the lack of large-scale annotated pediatric imaging datasets. Establishing a proper framework to leverage existing adult large-scale CXR datasets can thus enhance pediatric pneumonia detection performance. In this paper, we propose a three-branch parallel path learning-based framework that utilizes both adult and pediatric datasets to improve the performance of deep learning models on pediatric test datasets. The paths are trained with pediatric only, adult only, and both types of CXRs, respectively. Our proposed framework utilizes the multi-positive contrastive loss to cluster the classwise embeddings and the embedding similarity loss among these three parallel paths to make the classwise embeddings as close as possible to reduce the effect of domain shift. Experimental evaluations on open-access adult and pediatric CXR datasets show that the proposed method achieves a superior AUROC score of 0.8464 compared to 0.8348 obtained using the conventional approach of join training on both datasets. The proposed approach thus paves the way for generalized CAD models that are effective for both adult and pediatric age groups.
Abstract:Objective: Lung auscultation is a valuable tool in diagnosing and monitoring various respiratory diseases. However, lung sounds (LS) are significantly affected by numerous sources of contamination, especially when recorded in real-world clinical settings. Conventional denoising models prove impractical for LS denoising, primarily owing to spectral overlap complexities arising from diverse noise sources. To address this issue, we propose a specialized deep-learning model (Uformer) for lung sound denoising. Methods: The proposed Uformer model is constituted of three modules: a Convolutional Neural Network (CNN) encoder module, dedicated to extracting latent features; a Transformer encoder module, employed to further enhance the encoding of unique LS features and effectively capture intricate long-range dependencies; and a CNN decoder module, employed to generate the denoised signals. An ablation study was performed in order to find the most optimal architecture. Results: The performance of the proposed Uformer model was evaluated on lung sounds induced with different types of synthetic and real-world noises. Lung sound signals of -12 dB to 15 dB signal-to-noise ratio (SNR) were considered in testing experiments. The proposed model showed an average SNR improvement of 16.51 dB when evaluated with -12 dB LS signals. Our end-to-end model, with an average SNR improvement of 19.31 dB, outperforms the existing model when evaluated with ambient noise and fewer parameters. Conclusion: Based on the qualitative and quantitative findings in this study, it can be stated that Uformer is robust and generalized to be used in assisting the monitoring of respiratory conditions.
Abstract:The recent 'Mpox' outbreak, formerly known as 'Monkeypox', has become a significant public health concern and has spread to over 110 countries globally. The challenge of clinically diagnosing mpox early on is due, in part, to its similarity to other types of rashes. Computer-aided screening tools have been proven valuable in cases where Polymerase Chain Reaction (PCR) based diagnosis is not immediately available. Deep learning methods are powerful in learning complex data representations, but their efficacy largely depends on adequate training data. To address this challenge, we present the "Mpox Skin Lesion Dataset Version 2.0 (MSLD v2.0)" as a follow-up to the previously released openly accessible dataset, one of the first datasets containing mpox lesion images. This dataset contains images of patients with mpox and five other non-mpox classes (chickenpox, measles, hand-foot-mouth disease, cowpox, and healthy). We benchmark the performance of several state-of-the-art deep learning models, including VGG16, ResNet50, DenseNet121, MobileNetV2, EfficientNetB3, InceptionV3, and Xception, to classify mpox and other infectious skin diseases. In order to reduce the impact of racial bias, we utilize a color space data augmentation method to increase skin color variability during training. Additionally, by leveraging transfer learning implemented with pre-trained weights generated from the HAM10000 dataset, an extensive collection of pigmented skin lesion images, we achieved the best overall accuracy of $83.59\pm2.11\%$. Finally, the developed models are incorporated within a prototype web application to analyze uploaded skin images by a user and determine whether a subject is a suspected mpox patient.
Abstract:Performance degradation due to source domain mismatch is a longstanding challenge in deep learning-based medical image analysis, particularly for chest X-rays. Several methods have been proposed to address this domain shift, such as utilizing adversarial learning or multi-domain mixups to extract domain-invariant high-level features. However, these methods do not explicitly account for or regularize the content and style attributes of the extracted domain-invariant features. Recent studies have demonstrated that CNN models exhibit a strong bias toward styles (i.e., textures) rather than content, in stark contrast to the human-vision system. Explainable representations are paramount for a robust and generalizable understanding of medical images. Thus, the learned high-level semantic features need to be both content-specific, i.e., pathology-specific and domain-agnostic, as well as style invariant. Inspired by this, we propose a novel framework that improves cross-domain performances by focusing more on content while reducing style bias. We employ a style randomization module at both image and feature levels to create stylized perturbation features while preserving the content using an end-to-end framework. We extract the global features from the backbone model for the same chest X-ray with and without style randomized. We apply content consistency regularization between them to tweak the framework's sensitivity toward content markers for accurate predictions. Extensive experiments on unseen domain test datasets demonstrate that our proposed pipeline is more robust in the presence of domain shifts and achieves state-of-the-art performance. Our code is available via https://github.com/rafizunaed/domain_agnostic_content_aware_style_invariant.
Abstract:Pattern recognition from audio signals is an active research topic encompassing audio tagging, acoustic scene classification, music classification, and other areas. Spectrogram and mel-frequency cepstral coefficients (MFCC) are among the most commonly used features for audio signal analysis and classification. Recently, deep convolutional neural networks (CNN) have been successfully used for audio classification problems using spectrogram-based 2D features. In this paper, we present SpectNet, an integrated front-end layer that extracts spectrogram features within a CNN architecture that can be used for audio pattern recognition tasks. The front-end layer utilizes learnable gammatone filters that are initialized using mel-scale filters. The proposed layer outputs a 2D spectrogram image which can be fed into a 2D CNN for classification. The parameters of the entire network, including the front-end filterbank, can be updated via back-propagation. This training scheme allows for fine-tuning the spectrogram-image features according to the target audio dataset. The proposed method is evaluated in two different audio signal classification tasks: heart sound anomaly detection and acoustic scene classification. The proposed method shows a significant 1.02\% improvement in MACC for the heart sound classification task and 2.11\% improvement in accuracy for the acoustic scene classification task compared to the classical spectrogram image features. The source code of our experiments can be found at \url{https://github.com/mHealthBuet/SpectNet}
Abstract:Computer-aided disease diagnosis and prognosis based on medical images is a rapidly emerging field. Many Convolutional Neural Network (CNN) architectures have been developed by researchers for disease classification and localization from chest X-ray images. It is known that different thoracic disease lesions are more likely to occur in specific anatomical regions compared to others. Based on this knowledge, we first estimate a disease-dependent spatial probability, i.e., an anatomical prior, that indicates the probability of occurrence of a disease in a specific region in a chest X-ray image. Next, we develop a novel attention-based classification model that combines information from the estimated anatomical prior and automatically extracted chest region of interest (ROI) masks to provide attention to the feature maps generated from a deep convolution network. Unlike previous works that utilize various self-attention mechanisms, the proposed method leverages the extracted chest ROI masks along with the probabilistic anatomical prior information, which selects the region of interest for different diseases to provide attention. The proposed method shows superior performance in disease classification on the NIH ChestX-ray14 dataset compared to existing state-of-the-art methods while reaching an area under the ROC curve (AUC) of 0.8427. Regarding disease localization, the proposed method shows competitive performance compared to state-of-the-art methods, achieving an accuracy of 61% with an Intersection over Union (IoU) threshold of 0.3. The proposed method can also be generalized to other medical image-based disease classification and localization tasks where the probability of occurrence of the lesion is dependent on specific anatomical sites.
Abstract:Body-worn first-person vision (FPV) camera enables to extract a rich source of information on the environment from the subject's viewpoint. However, the research progress in wearable camera-based egocentric office activity understanding is slow compared to other activity environments (e.g., kitchen and outdoor ambulatory), mainly due to the lack of adequate datasets to train more sophisticated (e.g., deep learning) models for human activity recognition in office environments. This paper provides details of a large and publicly available office activity dataset (BON) collected in different office settings across three geographical locations: Barcelona (Spain), Oxford (UK) and Nairobi (Kenya), using a chest-mounted GoPro Hero camera. The BON dataset contains eighteen common office activities that can be categorised into person-to-person interactions (e.g., Chat with colleagues), person-to-object (e.g., Writing on a whiteboard), and proprioceptive (e.g., Walking). Annotation is provided for each segment of video with 5-seconds duration. Generally, BON contains 25 subjects and 2639 total segments. In order to facilitate further research in the sub-domain, we have also provided results that could be used as baselines for future studies.
Abstract:The recent monkeypox outbreak has become a public health concern due to its rapid spread in more than 40 countries outside Africa. Clinical diagnosis of monkeypox in an early stage is challenging due to its similarity with chickenpox and measles. In cases where the confirmatory Polymerase Chain Reaction (PCR) tests are not readily available, computer-assisted detection of monkeypox lesions could be beneficial for surveillance and rapid identification of suspected cases. Deep learning methods have been found effective in the automated detection of skin lesions, provided that sufficient training examples are available. However, as of now, such datasets are not available for the monkeypox disease. In the current study, we first develop the ``Monkeypox Skin Lesion Dataset (MSLD)" consisting skin lesion images of monkeypox, chickenpox, and measles. The images are mainly collected from websites, news portals, and publicly accessible case reports. Data augmentation is used to increase the sample size, and a 3-fold cross-validation experiment is set up. In the next step, several pre-trained deep learning models, namely, VGG-16, ResNet50, and InceptionV3 are employed to classify monkeypox and other diseases. An ensemble of the three models is also developed. ResNet50 achieves the best overall accuracy of $82.96(\pm4.57\%)$, while VGG16 and the ensemble system achieved accuracies of $81.48(\pm6.87\%)$ and $79.26(\pm1.05\%)$, respectively. A prototype web-application is also developed as an online monkeypox screening tool. While the initial results on this limited dataset are promising, a larger demographically diverse dataset is required to further enhance the generalizability of these models.
Abstract:Thoracic disease detection from chest radiographs using deep learning methods has been an active area of research in the last decade. Most previous methods attempt to focus on the diseased organs of the image by identifying spatial regions responsible for significant contributions to the model's prediction. In contrast, expert radiologists first locate the prominent anatomical structures before determining if those regions are anomalous. Therefore, integrating anatomical knowledge within deep learning models could bring substantial improvement in automatic disease classification. This work proposes an anatomy-aware attention-based architecture named Anatomy X-Net, that prioritizes the spatial features guided by the pre-identified anatomy regions. We leverage a semi-supervised learning method using the JSRT dataset containing organ-level annotation to obtain the anatomical segmentation masks (for lungs and heart) for the NIH and CheXpert datasets. The proposed Anatomy X-Net uses the pre-trained DenseNet-121 as the backbone network with two corresponding structured modules, the Anatomy Aware Attention (AAA) and Probabilistic Weighted Average Pooling (PWAP), in a cohesive framework for anatomical attention learning. Our proposed method sets new state-of-the-art performance on the official NIH test set with an AUC score of 0.8439, proving the efficacy of utilizing the anatomy segmentation knowledge to improve the thoracic disease classification. Furthermore, the Anatomy X-Net yields an averaged AUC of 0.9020 on the Stanford CheXpert dataset, improving on existing methods that demonstrate the generalizability of the proposed framework.