Abstract:Low-dose CT (LDCT) significantly reduces the radiation dose received by patients, thereby decreasing potential health risks. However, dose reduction introduces additional noise and artifacts, adversely affecting image quality and clinical diagnosis. Currently, denoising methods based on convolutional neural networks (CNNs) face limitations in long-range modeling capabilities, while Transformer-based denoising methods, although capable of powerful long-range modeling, suffer from high computational complexity. Furthermore, the denoised images predicted by deep learning-based techniques inevitably exhibit differences in noise distribution compared to Normal-dose CT (NDCT) images, which can also impact the final image quality and diagnostic outcomes. In recent years, the feasibility of applying deep learning methods to low-dose CT imaging has been demonstrated, leading to significant achievements. This paper proposes CT-Mamba, a hybrid convolutional State Space Model for LDCT image denoising. The model combines the local feature extraction advantages of CNNs with Mamba's global modeling capability, enabling it to capture both local details and global context. Additionally, a Mamba-driven deep noise power spectrum (NPS) loss function was designed to guide model training, ensuring that the noise texture of the denoised LDCT images closely resembles that of NDCT images, thereby enhancing overall image quality and diagnostic value. Experimental results have demonstrated that CT-Mamba performs excellently in reducing noise in LDCT images, enhancing detail preservation, and optimizing noise texture distribution, while demonstrating statistically similar radiomics features to those of NDCT images (p > 0.05). The proposed CT-Mamba demonstrates outstanding performance in LDCT denoising and holds promise as a representative approach for applying the Mamba framework to LDCT denoising tasks.
Abstract:During the early stages of respiratory virus outbreaks, such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the efficient utilize of limited nasopharyngeal swabs for rapid and accurate screening is crucial for public health. In this study, we present a methodology that integrates attenuated total reflection-Fourier transform infrared spectroscopy (ATR-FTIR) with the adaptive iteratively reweighted penalized least squares (airPLS) preprocessing algorithm and a channel-wise attention-based partial least squares one-dimensional convolutional neural network (PLS-1D-CNN) model, enabling accurate screening of infected individuals within 10 minutes. Two cohorts of nasopharyngeal swab samples, comprising 126 and 112 samples from suspected SARS-CoV-2 Omicron variant cases, were collected at Beijing You'an Hospital for verification. Given that ATR-FTIR spectra are highly sensitive to variations in experimental conditions, which can affect their quality, we propose a biomolecular importance (BMI) evaluation method to assess signal quality across different conditions, validated by comparing BMI with PLS-GBM and PLS-RF results. For the ATR-FTIR signals in cohort 2, which exhibited a higher BMI, airPLS was utilized for signal preprocessing, followed by the application of the channel-wise attention-based PLS-1D-CNN model for screening. The experimental results demonstrate that our model outperforms recently reported methods in the field of respiratory virus spectrum detection, achieving a recognition screening accuracy of 96.48%, a sensitivity of 96.24%, a specificity of 97.14%, an F1-score of 96.12%, and an AUC of 0.99. It meets the World Health Organization (WHO) recommended criteria for an acceptable product: sensitivity of 95.00% or greater and specificity of 97.00% or greater for testing prior SARS-CoV-2 infection in moderate to high volume scenarios.
Abstract:Gait benchmark empowers uncounted encouraging research fields such as gait recognition, humanoid locomotion, etc. Despite the growing focus on gait analysis, the research community is hindered by the limitations of the currently available databases, which mostly consist of videos or images with limited labeling. In this paper, we introduce GaitMotion, a multitask dataset leveraging wearable sensors to capture the patients' real-time movement with pathological gait. This dataset offers extensive ground-truth labeling for multiple tasks, including step/stride segmentation and step/stride length prediction, empowers researchers with a more holistic understanding of gait disturbances linked to neurological impairments. The wearable gait analysis suit captures the gait cycle, pattern, and parameters for both normal and pathological subjects. This data may prove beneficial for healthcare products focused on patient progress monitoring and post-disease recovery, as well as for forensics technologies aimed at person reidentification, and biomechanics research to aid in the development of humanoid robotics. Moreover, the analysis has considered the drift in data distribution across individual subjects. This drift can be attributed to each participant's unique behavioral habits or potential displacement of the sensor. Stride length variance for normal, Parkinson's, and stroke patients are compared to recognize the pathological walking pattern. As the baseline and benchmark, we provide an error of 14.1, 13.3, and 12.2 centimeters of stride length prediction for normal, Parkinson's, and Stroke gaits separately. We also analyzed the gait characteristics for normal and pathological gaits in terms of the gait cycle and gait parameters.
Abstract:Photoacoustic microscopy (PAM) is a novel implementation of photoacoustic imaging (PAI) for visualizing the 3D bio-structure, which is realized by raster scanning of the tissue. However, as three involved critical imaging parameters, imaging speed, lateral resolution, and penetration depth have mutual effect to one the other. The improvement of one parameter results in the degradation of other two parameters, which constrains the overall performance of the PAM system. Here, we propose to break these limitations by hardware and software co-design. Starting with low lateral resolution, low sampling rate AR-PAM imaging which possesses the deep penetration capability, we aim to enhance the lateral resolution and up sampling the images, so that high speed, super resolution, and deep penetration for the PAM system (HSD-PAM) can be achieved. Data-driven based algorithm is a promising approach to solve this issue, thereby a dedicated novel dual branch fusion network is proposed, which includes a high resolution branch and a high speed branch. Since the availability of switchable AR-OR-PAM imaging system, the corresponding low resolution, undersample AR-PAM and high resolution, full sampled OR-PAM image pairs are utilized for training the network. Extensive simulation and in vivo experiments have been conducted to validate the trained model, enhancement results have proved the proposed algorithm achieved the best perceptual and quantitative image quality. As a result, the imaging speed is increased 16 times and the imaging lateral resolution is improved 5 times, while the deep penetration merit of AR-PAM modality is still reserved.
Abstract:The Vernier effect has seen extensive application in optical structures, serving to augment the free spectral range (FSR). A substantial FSR is vital in a myriad of applications including multiplexers, enabling a broad, clear band comparable to the C-band to accommodate a maximum number of channels. Nevertheless, a large FSR often conflicts with bending loss, as it necessitates a smaller resonator radius, thus increase the insertion loss in the bending portion. To facilitate FSR expansion without amplifying bending loss, we employed cascaded and parallel racetrack resonators and ring resonators of varying radius that demonstrate the Vernier effect. In this study, we designed, fabricated, and tested multiple types of racetrack resonators to validate the Vernier effect and its FSR extension capabilities. Our investigations substantiate that the Vernier effect, based on cascaded and series-coupled micro-ring resonator (MRR) sensors, can efficiently mitigate intra-channel cross-talk at higher data rates. This is achieved by providing larger input-to-through suppression, thus paving the way for future applications.
Abstract:Face recognition service providers protect face privacy by extracting compact and discriminative facial features (representations) from images, and storing the facial features for real-time recognition. However, such features can still be exploited to recover the appearance of the original face by building a reconstruction network. Although several privacy-preserving methods have been proposed, the enhancement of face privacy protection is at the expense of accuracy degradation. In this paper, we propose an adversarial features-based face privacy protection (AdvFace) approach to generate privacy-preserving adversarial features, which can disrupt the mapping from adversarial features to facial images to defend against reconstruction attacks. To this end, we design a shadow model which simulates the attackers' behavior to capture the mapping function from facial features to images and generate adversarial latent noise to disrupt the mapping. The adversarial features rather than the original features are stored in the server's database to prevent leaked features from exposing facial information. Moreover, the AdvFace requires no changes to the face recognition network and can be implemented as a privacy-enhancing plugin in deployed face recognition systems. Extensive experimental results demonstrate that AdvFace outperforms the state-of-the-art face privacy-preserving methods in defending against reconstruction attacks while maintaining face recognition accuracy.
Abstract:One-shot object detection (OSOD) aims to detect all object instances towards the given category specified by a query image. Most existing studies in OSOD endeavor to explore effective cross-image correlation and alleviate the semantic feature misalignment, however, ignoring the phenomenon of the model bias towards the base classes and the generalization degradation on the novel classes. Observing this, we propose a novel framework, namely Base-class Suppression and Prior Guidance (BSPG) network to overcome the problem. Specifically, the objects of base categories can be explicitly detected by a base-class predictor and adaptively eliminated by our base-class suppression module. Moreover, a prior guidance module is designed to calculate the correlation of high-level features in a non-parametric manner, producing a class-agnostic prior map to provide the target features with rich semantic cues and guide the subsequent detection process. Equipped with the proposed two modules, we endow the model with a strong discriminative ability to distinguish the target objects from distractors belonging to the base classes. Extensive experiments show that our method outperforms the previous techniques by a large margin and achieves new state-of-the-art performance under various evaluation settings.
Abstract:Estimating the composition and concentration of ambient gases is crucial for industrial gas safety. Even though other researchers have proposed some gas identification and con-centration estimation algorithms, these algorithms still suffer from severe flaws, particularly in fulfilling industry demands. One example is that the lengths of data collected in an industrial setting tend to vary. The conventional algorithm, yet, cannot be used to analyze the variant-length data effectively. Trimming the data will preserve only steady-state values, inevitably leading to the loss of vital information. The gas identification and concentration estimation model called GCN-ViT(GViT) is proposed in this paper; we view the sensor data to be a one-way chain that has only been downscaled to retain the majority of the original in-formation. The GViT model can directly utilize sensor ar-rays' variable-length real-time signal data as input. We validated the above model on a dataset of 12-hour uninterrupted monitoring of two randomly varying gas mixtures, CO-ethylene and methane-ethylene. The accuracy of gas identification can reach 97.61%, R2 of the pure gas concentration estimation is above 99.5% on average, and R2 of the mixed gas concentration estimation is above 95% on average.
Abstract:Recent advances in wearable devices and Internet-of-Things (IoT) have led to massive growth in sensor data generated in edge devices. Labeling such massive data for classification tasks has proven to be challenging. In addition, data generated by different users bear various personal attributes and edge heterogeneity, rendering it impractical to develop a global model that adapts well to all users. Concerns over data privacy and communication costs also prohibit centralized data accumulation and training. This paper proposes a novel personalized semi-supervised federated learning (SemiPFL) framework to support edge users having no label or limited labeled datasets and a sizable amount of unlabeled data that is insufficient to train a well-performing model. In this work, edge users collaborate to train a hyper-network in the server, generating personalized autoencoders for each user. After receiving updates from edge users, the server produces a set of base models for each user, which the users locally aggregate them using their own labeled dataset. We comprehensively evaluate our proposed framework on various public datasets and demonstrate that SemiPFL outperforms state-of-art federated learning frameworks under the same assumptions. We also show that the solution performs well for users without labeled datasets or having limited labeled datasets and increasing performance for increased labeled data and number of users, signifying the effectiveness of SemiPFL for handling edge heterogeneity and limited annotation. By leveraging personalized semi-supervised learning, SemiPFL dramatically reduces the need for annotating data and preserving privacy in a wide range of application scenarios, from wearable health to IoT.
Abstract:As a special type of object detection, pedestrian detection in generic scenes has made a significant progress trained with large amounts of labeled training data manually. While the models trained with generic dataset work bad when they are directly used in specific scenes. With special viewpoints, flow light and backgrounds, datasets from specific scenes are much different from the datasets from generic scenes. In order to make the generic scene pedestrian detectors work well in specific scenes, the labeled data from specific scenes are needed to adapt the models to the specific scenes. While labeling the data manually spends much time and money, especially for specific scenes, each time with a new specific scene, large amounts of images must be labeled. What's more, the labeling information is not so accurate in the pixels manually and different people make different labeling information. In this paper, we propose an ACP-based method, with augmented reality's help, we build the virtual world of specific scenes, and make people walking in the virtual scenes where it is possible for them to appear to solve this problem of lacking labeled data and the results show that data from virtual world is helpful to adapt generic pedestrian detectors to specific scenes.