Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ji He

SS-CTML: Self-Supervised Cross-Task Mutual Learning for CT Image Reconstruction

Dec 31, 2024

Gaofeng Chen, Yaoduo Zhang, Li Huang, Pengfei Wang, Wenyu Zhang, Dong Zeng, Jianhua Ma, Ji He

Abstract:Supervised deep-learning (SDL) techniques with paired training datasets have been widely studied for X-ray computed tomography (CT) image reconstruction. However, due to the difficulties of obtaining paired training datasets in clinical routine, the SDL methods are still away from common uses in clinical practices. In recent years, self-supervised deep-learning (SSDL) techniques have shown great potential for the studies of CT image reconstruction. In this work, we propose a self-supervised cross-task mutual learning (SS-CTML) framework for CT image reconstruction. Specifically, a sparse-view scanned and a limited-view scanned sinogram data are first extracted from a full-view scanned sinogram data, which results in three individual reconstruction tasks, i.e., the full-view CT (FVCT) reconstruction, the sparse-view CT (SVCT) reconstruction, and limited-view CT (LVCT) reconstruction. Then, three neural networks are constructed for the three reconstruction tasks. Considering that the ultimate goals of the three tasks are all to reconstruct high-quality CT images, we therefore construct a set of cross-task mutual learning objectives for the three tasks, in which way, the three neural networks can be self-supervised optimized by learning from each other. Clinical datasets are adopted to evaluate the effectiveness of the proposed framework. Experimental results demonstrate that the SS-CTML framework can obtain promising CT image reconstruction performance in terms of both quantitative and qualitative measurements.

Via

Access Paper or Ask Questions

Crack-EdgeSAM Self-Prompting Crack Segmentation System for Edge Devices

Dec 10, 2024

Yingchu Wang, Ji He, Shijie Yu

Abstract:Structural health monitoring (SHM) is essential for the early detection of infrastructure defects, such as cracks in concrete bridge pier. but often faces challenges in efficiency and accuracy in complex environments. Although the Segment Anything Model (SAM) achieves excellent segmentation performance, its computational demands limit its suitability for real-time applications on edge devices. To address these challenges, this paper proposes Crack-EdgeSAM, a self-prompting crack segmentation system that integrates YOLOv8 for generating prompt boxes and a fine-tuned EdgeSAM model for crack segmentation. To ensure computational efficiency, the method employs ConvLoRA, a Parameter-Efficient Fine-Tuning (PEFT) technique, along with DiceFocalLoss to fine-tune the EdgeSAM model. Our experimental results on public datasets and the climbing robot automatic inspections demonstrate that the system achieves high segmentation accuracy and significantly enhanced inference speed compared to the most recent methods. Notably, the system processes 1024 x 1024 pixels images at 46 FPS on our PC and 8 FPS on Jetson Orin Nano.

Via

Access Paper or Ask Questions

Exploring Depth Information for Detecting Manipulated Face Videos

Nov 27, 2024

Haoyue Wang, Sheng Li, Ji He, Zhenxing Qian, Xinpeng Zhang, Shaolin Fan

Abstract:Face manipulation detection has been receiving a lot of attention for the reliability and security of the face images/videos. Recent studies focus on using auxiliary information or prior knowledge to capture robust manipulation traces, which are shown to be promising. As one of the important face features, the face depth map, which has shown to be effective in other areas such as face recognition or face detection, is unfortunately paid little attention to in literature for face manipulation detection. In this paper, we explore the possibility of incorporating the face depth map as auxiliary information for robust face manipulation detection. To this end, we first propose a Face Depth Map Transformer (FDMT) to estimate the face depth map patch by patch from an RGB face image, which is able to capture the local depth anomaly created due to manipulation. The estimated face depth map is then considered as auxiliary information to be integrated with the backbone features using a Multi-head Depth Attention (MDA) mechanism that is newly designed. We also propose an RGB-Depth Inconsistency Attention (RDIA) module to effectively capture the inter-frame inconsistency for multi-frame input. Various experiments demonstrate the advantage of our proposed method for face manipulation detection.

* 12 pages, 10 figures. arXiv admin note: substantial text overlap with arXiv:2212.14230

Via

Access Paper or Ask Questions

Covert Communication in Hybrid Microwave/mmWave A2G Systems with Transmission Mode Selection

Feb 01, 2023

Wenhao Zhang, Ji He, Yulong Shen, Xiaohong Jiang

Abstract:This paper investigates the covert communication in an air-to-ground (A2G) system, where a UAV (Alice) can adopt the omnidirectional microwave (OM) or directional mmWave (DM) transmission mode to transmit covert data to a ground user (Bob) while suffering from the detection of an adversary (Willie). For both the OM and DM modes, we first conduct theoretical analysis to reveal the inherent relationship between the transmit rate/transmit power and basic covert performance metrics in terms of detection error probability (DEP), effective covert rate (ECR), and covert Shannon capacity (CSC). To facilitate the transmission mode selection at Alice, we then explore the optimization of transmit rate and transmit power for ECR/CSC maximization under the OM and DM modes, and further propose a hybrid OM/DM transmission mode which allows the UAV to adaptively select between the OM and DM modes to achieve the maximum ECR and CSC at a given location of UAV. Finally, extensive numerical results are provided to illustrate the covert performances of the concerned A2G system under different transmission modes, and demonstrate that the hybrid OM/DM transmission mode outperforms the pure OM or DM mode in terms of covert performance.

Via

Access Paper or Ask Questions

Radon Inversion via Deep Learning

Aug 09, 2018

Ji He, Jianhua Ma

Abstract:Radon transform is widely used in physical and life sciences and one of its major applications is the X-ray computed tomography (X-ray CT), which is significant in modern health examination. The Radon inversion or image reconstruction is challenging due to the potentially defective radon projections. Conventionally, the reconstruction process contains several ad hoc stages to approximate the corresponding Radon inversion. Each of the stages is highly dependent on the results of the previous stage. In this paper, we propose a novel unified framework for Radon inversion via deep learning (DL). The Radon inversion can be approximated by the proposed framework with an end-to-end fashion instead of processing step-by-step with multiple stages. For simplicity, the proposed framework is short as iRadonMap (inverse Radon transform approximation). Specifically, we implement the iRadonMap as an appropriative neural network, of which the architecture can be divided into two segments. In the first segment, a learnable fully-connected filtering layer is used to filter the radon projections along the view-angle direction, which is followed by a learnable sinusoidal back-projection layer to transfer the filtered radon projections into an image. The second segment is a common neural network architecture to further improve the reconstruction performance in the image domain. The iRadonMap is overall optimized by training a large number of generic images from ImageNet database. To evaluate the performance of the iRadonMap, clinical patient data is used. Qualitative results show promising reconstruction performance of the iRadonMap.

Via

Access Paper or Ask Questions

Reinforcement Learning with External Knowledge and Two-Stage Q-functions for Predicting Popular Reddit Threads

Apr 20, 2017

Ji He, Mari Ostendorf, Xiaodong He

Figure 1 for Reinforcement Learning with External Knowledge and Two-Stage Q-functions for Predicting Popular Reddit Threads

Figure 2 for Reinforcement Learning with External Knowledge and Two-Stage Q-functions for Predicting Popular Reddit Threads

Figure 3 for Reinforcement Learning with External Knowledge and Two-Stage Q-functions for Predicting Popular Reddit Threads

Figure 4 for Reinforcement Learning with External Knowledge and Two-Stage Q-functions for Predicting Popular Reddit Threads

Abstract:This paper addresses the problem of predicting popularity of comments in an online discussion forum using reinforcement learning, particularly addressing two challenges that arise from having natural language state and action spaces. First, the state representation, which characterizes the history of comments tracked in a discussion at a particular point, is augmented to incorporate the global context represented by discussions on world events available in an external knowledge source. Second, a two-stage Q-learning framework is introduced, making it feasible to search the combinatorial action space while also accounting for redundancy among sub-actions. We experiment with five Reddit communities, showing that the two methods improve over previous reported results on this task.

Via

Access Paper or Ask Questions

Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads

Sep 17, 2016

Ji He, Mari Ostendorf, Xiaodong He, Jianshu Chen, Jianfeng Gao, Lihong Li, Li Deng

Figure 1 for Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads

Figure 2 for Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads

Figure 3 for Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads

Figure 4 for Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads

Abstract:We introduce an online popularity prediction and tracking task as a benchmark task for reinforcement learning with a combinatorial, natural language action space. A specified number of discussion threads predicted to be popular are recommended, chosen from a fixed window of recent comments to track. Novel deep reinforcement learning architectures are studied for effective modeling of the value function associated with actions comprised of interdependent sub-actions. The proposed model, which represents dependence between sub-actions through a bi-directional LSTM, gives the best performance across different experimental configurations and domains, and it also generalizes well with varying numbers of recommendation requests.

* To be published in EMNLP 2016, 11 pages

Via

Access Paper or Ask Questions

Deep Reinforcement Learning with a Natural Language Action Space

Jun 08, 2016

Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, Mari Ostendorf

Figure 1 for Deep Reinforcement Learning with a Natural Language Action Space

Figure 2 for Deep Reinforcement Learning with a Natural Language Action Space

Figure 3 for Deep Reinforcement Learning with a Natural Language Action Space

Figure 4 for Deep Reinforcement Learning with a Natural Language Action Space

Abstract:This paper introduces a novel architecture for reinforcement learning with deep neural networks designed to handle state and action spaces characterized by natural language, as found in text-based games. Termed a deep reinforcement relevance network (DRRN), the architecture represents action and state spaces with separate embedding vectors, which are combined with an interaction function to approximate the Q-function in reinforcement learning. We evaluate the DRRN on two popular text games, showing superior performance over other deep Q-learning architectures. Experiments with paraphrased action descriptions show that the model is extracting meaning rather than simply memorizing strings of text.

* accepted by ACL 2016

Via

Access Paper or Ask Questions

Recurrent Reinforcement Learning: A Hybrid Approach

Nov 19, 2015

Xiujun Li, Lihong Li, Jianfeng Gao, Xiaodong He, Jianshu Chen, Li Deng, Ji He

Figure 1 for Recurrent Reinforcement Learning: A Hybrid Approach

Figure 2 for Recurrent Reinforcement Learning: A Hybrid Approach

Figure 3 for Recurrent Reinforcement Learning: A Hybrid Approach

Figure 4 for Recurrent Reinforcement Learning: A Hybrid Approach

Abstract:Successful applications of reinforcement learning in real-world problems often require dealing with partially observable states. It is in general very challenging to construct and infer hidden states as they often depend on the agent's entire interaction history and may require substantial domain knowledge. In this work, we investigate a deep-learning approach to learning the representation of states in partially observable tasks, with minimal prior knowledge of the domain. In particular, we propose a new family of hybrid models that combines the strength of both supervised learning (SL) and reinforcement learning (RL), trained in a joint fashion: The SL component can be a recurrent neural networks (RNN) or its long short-term memory (LSTM) version, which is equipped with the desired property of being able to capture long-term dependency on history, thus providing an effective way of learning the representation of hidden states. The RL component is a deep Q-network (DQN) that learns to optimize the control for maximizing long-term rewards. Extensive experiments in a direct mailing campaign problem demonstrate the effectiveness and advantages of the proposed approach, which performs the best among a set of previous state-of-the-art methods.

* 11 pages, 6 figures

Via

Access Paper or Ask Questions

End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture

Nov 01, 2015

Jianshu Chen, Ji He, Yelong Shen, Lin Xiao, Xiaodong He, Jianfeng Gao, Xinying Song, Li Deng

Figure 1 for End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture

Figure 2 for End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture

Figure 3 for End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture

Figure 4 for End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture

Abstract:We develop a fully discriminative learning approach for supervised Latent Dirichlet Allocation (LDA) model using Back Propagation (i.e., BP-sLDA), which maximizes the posterior probability of the prediction variable given the input document. Different from traditional variational learning or Gibbs sampling approaches, the proposed learning method applies (i) the mirror descent algorithm for maximum a posterior inference and (ii) back propagation over a deep architecture together with stochastic gradient/mirror descent for model parameter estimation, leading to scalable and end-to-end discriminative learning of the model. As a byproduct, we also apply this technique to develop a new learning method for the traditional unsupervised LDA model (i.e., BP-LDA). Experimental results on three real-world regression and classification tasks show that the proposed methods significantly outperform the previous supervised topic models, neural networks, and is on par with deep neural networks.

* Proc. NIPS 2015

Via

Access Paper or Ask Questions