Abstract:With their prominent scene understanding and reasoning capabilities, pre-trained visual-language models (VLMs) such as GPT-4V have attracted increasing attention in robotic task planning. Compared with traditional task planning strategies, VLMs are strong in multimodal information parsing and code generation and show remarkable efficiency. Although VLMs demonstrate great potential in robotic task planning, they suffer from challenges like hallucination, semantic complexity, and limited context. To handle such issues, this paper proposes a multi-agent framework, i.e., GameVLM, to enhance the decision-making process in robotic task planning. In this study, VLM-based decision and expert agents are presented to conduct the task planning. Specifically, decision agents are used to plan the task, and the expert agent is employed to evaluate these task plans. Zero-sum game theory is introduced to resolve inconsistencies among different agents and determine the optimal solution. Experimental results on real robots demonstrate the efficacy of the proposed framework, with an average success rate of 83.3%.
Abstract:In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features a T2-refine fusion decoder for quantitative analysis, leveraging global features from the Transformer, and a segmentation decoder with multiple local region supervision for enhanced accuracy. A tight coupling module aligns and fuses CNN and Transformer branch features, enabling SQNet to focus on myocardium regions. Evaluation on healthy controls (HC) and acute myocardial infarction patients (AMI) demonstrates superior segmentation dice scores (89.3/89.2) compared to state-of-the-art methods (87.7/87.9). T2 quantification yields strong linear correlations (Pearson coefficients: 0.84/0.93) with label values for HC/AMI, indicating accurate mapping. Radiologist evaluations confirm SQNet's superior image quality scores (4.60/4.58 for segmentation, 4.32/4.42 for T2 quantification) over state-of-the-art methods (4.50/4.44 for segmentation, 3.59/4.37 for T2 quantification). SQNet thus offers accurate simultaneous segmentation and quantification, enhancing cardiac disease diagnosis, such as AMI.
Abstract:Federated Learning (FL), a privacy-oriented distributed ML paradigm, is being gaining great interest in Internet of Things because of its capability to protect participants data privacy. Studies have been conducted to address challenges existing in standard FL, including communication efficiency and privacy-preserving. But they cannot achieve the goal of making a tradeoff between communication efficiency and model accuracy while guaranteeing privacy. This paper proposes a Conditional Random Sampling (CRS) method and implements it into the standard FL settings (CRS-FL) to tackle the above-mentioned challenges. CRS explores a stochastic coefficient based on Poisson sampling to achieve a higher probability of obtaining zero-gradient unbiasedly, and then decreases the communication overhead effectively without model accuracy degradation. Moreover, we dig out the relaxation Local Differential Privacy (LDP) guarantee conditions of CRS theoretically. Extensive experiment results indicate that (1) in communication efficiency, CRS-FL performs better than the existing methods in metric accuracy per transmission byte without model accuracy reduction in more than 7% sampling ratio (# sampling size / # model size); (2) in privacy-preserving, CRS-FL achieves no accuracy reduction compared with LDP baselines while holding the efficiency, even exceeding them in model accuracy under more sampling ratio conditions.
Abstract:Dunhuang murals suffer from fading, breakage, surface brittleness and extensive peeling affected by prolonged environmental erosion. Image inpainting techniques are widely used in the field of digital mural inpainting. Generally speaking, for mural inpainting tasks with large area damage, it is challenging for any image inpainting method. In this paper, we design a multi-stage progressive reasoning network (MPR-Net) containing global to local receptive fields for murals inpainting. This network is capable of recursively inferring the damage boundary and progressively tightening the regional texture constraints. Moreover, to adaptively fuse plentiful information at various scales of murals, a multi-scale feature aggregation module (MFA) is designed to empower the capability to select the significant features. The execution of the model is similar to the process of a mural restorer (i.e., inpainting the structure of the damaged mural globally first and then adding the local texture details further). Our method has been evaluated through both qualitative and quantitative experiments, and the results demonstrate that it outperforms state-of-the-art image inpainting methods.
Abstract:With the improvement of sensor technology and significant algorithmic advances, the accuracy of remote heart rate monitoring technology has been significantly improved. Despite of the significant algorithmic advances, the performance of rPPG algorithm can degrade in the long-term, high-intensity continuous work occurred in evenings or insufficient light environments. One of the main challenges is that the lost facial details and low contrast cause the failure of detection and tracking. Also, insufficient lighting in video capturing hurts the quality of physiological signal. In this paper, we collect a large-scale dataset that was designed for remote heart rate estimation recorded with various illumination variations to evaluate the performance of the rPPG algorithm (Green, ICA, and POS). We also propose a low-light enhancement solution (technical solution) for remote heart rate estimation under the low-light condition. Using collected dataset, we found 1) face detection algorithm cannot detect faces in video captured in low light conditions; 2) A decrease in the amplitude of the pulsatile signal will lead to the noise signal to be in the dominant position; and 3) the chrominance-based method suffers from the limitation in the assumption about skin-tone will not hold, and Green and ICA method receive less influence than POS in dark illuminance environment. The proposed solution for rPPG process is effective to detect and improve the signal-to-noise ratio and precision of the pulsatile signal.
Abstract:Dunhuang murals are a collection of Chinese style and national style, forming a self-contained Chinese-style Buddhist art. It has very high historical and cultural value and research significance. Among them, the lines of Dunhuang murals are highly general and expressive. It reflects the character's distinctive character and complex inner emotions. Therefore, the outline drawing of murals is of great significance to the research of Dunhuang Culture. The contour generation of Dunhuang murals belongs to image edge detection, which is an important branch of computer vision, aims to extract salient contour information in images. Although convolution-based deep learning networks have achieved good results in image edge extraction by exploring the contextual and semantic features of images. However, with the enlargement of the receptive field, some local detail information is lost. This makes it impossible for them to generate reasonable outline drawings of murals. In this paper, we propose a novel edge detector based on self-attention combined with convolution to generate line drawings of Dunhuang murals. Compared with existing edge detection methods, firstly, a new residual self-attention and convolution mixed module (Ramix) is proposed to fuse local and global features in feature maps. Secondly, a novel densely connected backbone extraction network is designed to efficiently propagate rich edge feature information from shallow layers into deep layers. Compared with existing methods, it is shown on different public datasets that our method is able to generate sharper and richer edge maps. In addition, testing on the Dunhuang mural dataset shows that our method can achieve very competitive performance.
Abstract:Ultrasound-guided nerve block anesthesia (UGNB) is a high-tech visual nerve block anesthesia method that can observe the target nerve and its surrounding structures, the puncture needle's advancement, and local anesthetics spread in real-time. The key in UGNB is nerve identification. With the help of deep learning methods, the automatic identification or segmentation of nerves can be realized, assisting doctors in completing nerve block anesthesia accurately and efficiently. Here, we establish a public dataset containing 320 ultrasound images of brachial plexus (BP). Three experienced doctors jointly produce the BP segmentation ground truth and label brachial plexus trunks. We design a brachial plexus segmentation system (BPSegSys) based on deep learning. BPSegSys achieves experienced-doctor-level nerve identification performance in various experiments. We evaluate BPSegSys' performance in terms of intersection-over-union (IoU), a commonly used performance measure for segmentation experiments. Considering three dataset groups in our established public dataset, the IoU of BPSegSys are 0.5238, 0.4715, and 0.5029, respectively, which exceed the IoU 0.5205, 0.4704, and 0.4979 of experienced doctors. In addition, we show that BPSegSys can help doctors identify brachial plexus trunks more accurately, with IoU improvement up to 27%, which has significant clinical application value.
Abstract:White-box Adversarial Example (AE) attacks towards Deep Neural Networks (DNNs) have a more powerful destructive capacity than black-box AE attacks in the fields of AE strategies. However, almost all the white-box approaches lack interpretation from the point of view of DNNs. That is, adversaries did not investigate the attacks from the perspective of interpretable features, and few of these approaches considered what features the DNN actually learns. In this paper, we propose an interpretable white-box AE attack approach, DI-AA, which explores the application of the interpretable approach of the deep Taylor decomposition in the selection of the most contributing features and adopts the Lagrangian relaxation optimization of the logit output and L_p norm to further decrease the perturbation. We compare DI-AA with six baseline attacks (including the state-of-the-art attack AutoAttack) on three datasets. Experimental results reveal that our proposed approach can 1) attack non-robust models with comparatively low perturbation, where the perturbation is closer to or lower than the AutoAttack approach; 2) break the TRADES adversarial training models with the highest success rate; 3) the generated AE can reduce the robust accuracy of the robust black-box models by 16% to 31% in the black-box transfer attack.
Abstract:The most concentrated application of lower-limb rehabilitation exoskeleton (LLE) robot is that it can help paraplegics "re-walk". However, "walking" in daily life is more than just walking on flat ground with fixed gait. This paper focuses on variable gaits generation for LLE robot to adapt complex walking environment. Different from traditional gaits generator for biped robot, the generated gaits for LLEs should be comfortable to patients. Inspired by the pose graph optimization algorithm in SLAM, we propose a graph-based gait generation algorithm called gait graph optimization (GGO) to generate variable, functional and comfortable gaits from one base gait collected from healthy individuals to adapt the walking environment. Variants of walking problem, e.g., stride adjustment, obstacle avoidance, and stair ascent and descent, help verify the proposed approach in simulation and experimentation. We open source our implementation.