Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuke Wang

Utilizing Sequential Information of General Lab-test Results and Diagnoses History for Differential Diagnosis of Dementia

Feb 21, 2025

Yizong Xing, Dhita Putri Pratama, Yuke Wang, Yufan Zhang, Brian E. Chapman

Abstract:Early diagnosis of Alzheimer's Disease (AD) faces multiple data-related challenges, including high variability in patient data, limited access to specialized diagnostic tests, and overreliance on single-type indicators. These challenges are exacerbated by the progressive nature of AD, where subtle pathophysiological changes often precede clinical symptoms by decades. To address these limitations, this study proposes a novel approach that takes advantage of routinely collected general laboratory test histories for the early detection and differential diagnosis of AD. By modeling lab test sequences as "sentences", we apply word embedding techniques to capture latent relationships between tests and employ deep time series models, including long-short-term memory (LSTM) and Transformer networks, to model temporal patterns in patient records. Experimental results demonstrate that our approach improves diagnostic accuracy and enables scalable and costeffective AD screening in diverse clinical settings.

* 7 pages, 6 figures. This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

The Study of Perceptual Training of Chinese Mandarin Tones for Monolingual Speakers of English Using Adaptive Computer Based Training Software

Sep 24, 2023

Yuke Wang

Abstract:The study explored a new technique of phonetic tone training, which may have a positive impact on second language learning and tone training.

* 53 pages

Via

Access Paper or Ask Questions

Faith: An Efficient Framework for Transformer Verification on GPUs

Sep 23, 2022

Boyuan Feng, Tianqi Tang, Yuke Wang, Zhaodong Chen, Zheng Wang, Shu Yang, Yuan Xie, Yufei Ding

Figure 1 for Faith: An Efficient Framework for Transformer Verification on GPUs

Figure 2 for Faith: An Efficient Framework for Transformer Verification on GPUs

Figure 3 for Faith: An Efficient Framework for Transformer Verification on GPUs

Figure 4 for Faith: An Efficient Framework for Transformer Verification on GPUs

Abstract:Transformer verification draws increasing attention in machine learning research and industry. It formally verifies the robustness of transformers against adversarial attacks such as exchanging words in a sentence with synonyms. However, the performance of transformer verification is still not satisfactory due to bound-centric computation which is significantly different from standard neural networks. In this paper, we propose Faith, an efficient framework for transformer verification on GPUs. We first propose a semantic-aware computation graph transformation to identify semantic information such as bound computation in transformer verification. We exploit such semantic information to enable efficient kernel fusion at the computation graph level. Second, we propose a verification-specialized kernel crafter to efficiently map transformer verification to modern GPUs. This crafter exploits a set of GPU hardware supports to accelerate verification specialized operations which are usually memory-intensive. Third, we propose an expert-guided autotuning to incorporate expert knowledge on GPU backends to facilitate large search space exploration. Extensive evaluations show that Faith achieves $2.1\times$ to $3.4\times$ ($2.6\times$ on average) speedup over state-of-the-art frameworks.

* Published in ATC'22

Via

Access Paper or Ask Questions

Empowering GNNs with Fine-grained Communication-Computation Pipelining on Multi-GPU Platforms

Sep 14, 2022

Yuke Wang, Boyuan Feng, Zheng Wang, Tong Geng, Kevin Barker, Ang Li, Yufei Ding

Figure 1 for Empowering GNNs with Fine-grained Communication-Computation Pipelining on Multi-GPU Platforms

Figure 2 for Empowering GNNs with Fine-grained Communication-Computation Pipelining on Multi-GPU Platforms

Figure 3 for Empowering GNNs with Fine-grained Communication-Computation Pipelining on Multi-GPU Platforms

Figure 4 for Empowering GNNs with Fine-grained Communication-Computation Pipelining on Multi-GPU Platforms

Abstract:The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN solutions suffer from inferior performance due to imbalanced computation and inefficient communication. To this end, we propose MGG, a novel system design to accelerate GNNs on multi-GPU platforms via a GPU-centric software pipeline. MGG explores the potential of hiding remote memory access latency in GNN workloads through fine-grained computation-communication pipelining. Specifically, MGG introduces a pipeline-aware workload management strategy and a hybrid data layout design to facilitate communication-computation overlapping. MGG implements an optimized pipeline-centric kernel. It includes workload interleaving and warp-based mapping for efficient GPU kernel operation pipelining and specialized memory designs and optimizations for better data access performance. Besides, MGG incorporates lightweight analytical modeling and optimization heuristics to dynamically improve the GNN execution performance for different settings at runtime. Comprehensive experiments demonstrate that MGG outperforms state-of-the-art multi-GPU systems across various GNN settings: on average 3.65X faster than multi-GPU systems with a unified virtual memory design and on average 7.38X faster than the DGCL framework.

Via

Access Paper or Ask Questions

TC-GNN: Accelerating Sparse Graph Neural Network Computation Via Dense Tensor Core on GPUs

Dec 03, 2021

Yuke Wang, Boyuan Feng, Yufei Ding

Figure 1 for TC-GNN: Accelerating Sparse Graph Neural Network Computation Via Dense Tensor Core on GPUs

Figure 2 for TC-GNN: Accelerating Sparse Graph Neural Network Computation Via Dense Tensor Core on GPUs

Figure 3 for TC-GNN: Accelerating Sparse Graph Neural Network Computation Via Dense Tensor Core on GPUs

Figure 4 for TC-GNN: Accelerating Sparse Graph Neural Network Computation Via Dense Tensor Core on GPUs

Abstract:Recently, graph neural networks (GNNs), as the backbone of graph-based machine learning, demonstrate great success in various domains (e.g., e-commerce). However, the performance of GNNs is usually unsatisfactory due to the highly sparse and irregular graph-based operations. To this end, we propose, TC-GNN, the first GPU Tensor Core Unit (TCU) based GNN acceleration framework. The core idea is to reconcile the "Sparse" GNN computation with "Dense" TCU. Specifically, we conduct an in-depth analysis of the sparse operations in mainstream GNN computing frameworks. We introduce a novel sparse graph translation technique to facilitate TCU processing of sparse GNN workload. We also implement an effective CUDA core and TCU collaboration design to fully utilize GPU resources. We fully integrate TC-GNN with the Pytorch framework for ease of programming. Rigorous experiments show an average of 1.70X speedup over the state-of-the-art Deep Graph Library framework across various GNN models and dataset settings.

Via

Access Paper or Ask Questions

Towards Efficient Ansatz Architecture for Variational Quantum Algorithms

Nov 26, 2021

Anbang Wu, Gushu Li, Yuke Wang, Boyuan Feng, Yufei Ding, Yuan Xie

Figure 1 for Towards Efficient Ansatz Architecture for Variational Quantum Algorithms

Figure 2 for Towards Efficient Ansatz Architecture for Variational Quantum Algorithms

Figure 3 for Towards Efficient Ansatz Architecture for Variational Quantum Algorithms

Figure 4 for Towards Efficient Ansatz Architecture for Variational Quantum Algorithms

Abstract:Variational quantum algorithms are expected to demonstrate the advantage of quantum computing on near-term noisy quantum computers. However, training such variational quantum algorithms suffers from gradient vanishing as the size of the algorithm increases. Previous work cannot handle the gradient vanishing induced by the inevitable noise effects on realistic quantum hardware. In this paper, we propose a novel training scheme to mitigate such noise-induced gradient vanishing. We first introduce a new cost function of which the gradients are significantly augmented by employing traceless observables in truncated subspace. We then prove that the same minimum can be reached by optimizing the original cost function with the gradients from the new cost function. Experiments show that our new training scheme is highly effective for major variational quantum algorithms of various tasks.

Via

Access Paper or Ask Questions

APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Jun 23, 2021

Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, Yufei Ding

Figure 1 for APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Figure 2 for APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Figure 3 for APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Figure 4 for APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Abstract:Over the years, accelerating neural networks with quantization has been widely studied. Unfortunately, prior efforts with diverse precisions (e.g., 1-bit weights and 2-bit activations) are usually restricted by limited precision support on GPUs (e.g., int1 and int4). To break such restrictions, we introduce the first Arbitrary Precision Neural Network framework (APNN-TC) to fully exploit quantization benefits on Ampere GPU Tensor Cores. Specifically, APNN-TC first incorporates a novel emulation algorithm to support arbitrary short bit-width computation with int1 compute primitives and XOR/AND Boolean operations. Second, APNN-TC integrates arbitrary precision layer designs to efficiently map our emulation algorithm to Tensor Cores with novel batching strategies and specialized memory organization. Third, APNN-TC embodies a novel arbitrary precision NN design to minimize memory access across layers and further improve performance. Extensive evaluations show that APNN-TC can achieve significant speedup over CUTLASS kernels and various NN models, such as ResNet and VGG.

* Accepted by SC'21

Via

Access Paper or Ask Questions

DSXplore: Optimizing Convolutional Neural Networks via Sliding-Channel Convolutions

Jan 04, 2021

Yuke Wang, Boyuan Feng, Yufei Ding

Figure 1 for DSXplore: Optimizing Convolutional Neural Networks via Sliding-Channel Convolutions

Figure 2 for DSXplore: Optimizing Convolutional Neural Networks via Sliding-Channel Convolutions

Figure 3 for DSXplore: Optimizing Convolutional Neural Networks via Sliding-Channel Convolutions

Figure 4 for DSXplore: Optimizing Convolutional Neural Networks via Sliding-Channel Convolutions

Abstract:As the key advancement of the convolutional neural networks (CNNs), depthwise separable convolutions (DSCs) are becoming one of the most popular techniques to reduce the computations and parameters size of CNNs meanwhile maintaining the model accuracy. It also brings profound impact to improve the applicability of the compute- and memory-intensive CNNs to a broad range of applications, such as mobile devices, which are generally short of computation power and memory. However, previous research in DSCs are largely focusing on compositing the limited existing DSC designs, thus, missing the opportunities to explore more potential designs that can achieve better accuracy and higher computation/parameter reduction. Besides, the off-the-shelf convolution implementations offer limited computing schemes, therefore, lacking support for DSCs with different convolution patterns. To this end, we introduce, DSXplore, the first optimized design for exploring DSCs on CNNs. Specifically, at the algorithm level, DSXplore incorporates a novel factorized kernel -- sliding-channel convolution (SCC), featured with input-channel overlapping to balance the accuracy performance and the reduction of computation and memory cost. SCC also offers enormous space for design exploration by introducing adjustable kernel parameters. Further, at the implementation level, we carry out an optimized GPU-implementation tailored for SCC by leveraging several key techniques, such as the input-centric backward design and the channel-cyclic optimization. Intensive experiments on different datasets across mainstream CNNs show the advantages of DSXplore in balancing accuracy and computation/parameter reduction over the standard convolution and the existing DSCs.

Via

Access Paper or Ask Questions

Uncertainty-aware Attention Graph Neural Network for Defending Adversarial Attacks

Sep 22, 2020

Boyuan Feng, Yuke Wang, Zheng Wang, Yufei Ding

Figure 1 for Uncertainty-aware Attention Graph Neural Network for Defending Adversarial Attacks

Figure 2 for Uncertainty-aware Attention Graph Neural Network for Defending Adversarial Attacks

Figure 3 for Uncertainty-aware Attention Graph Neural Network for Defending Adversarial Attacks

Figure 4 for Uncertainty-aware Attention Graph Neural Network for Defending Adversarial Attacks

Abstract:With the increasing popularity of graph-based learning, graph neural networks (GNNs) emerge as the essential tool for gaining insights from graphs. However, unlike the conventional CNNs that have been extensively explored and exhaustively tested, people are still worrying about the GNNs' robustness under the critical settings, such as financial services. The main reason is that existing GNNs usually serve as a black-box in predicting and do not provide the uncertainty on the predictions. On the other side, the recent advancement of Bayesian deep learning on CNNs has demonstrated its success of quantifying and explaining such uncertainties to fortify CNN models. Motivated by these observations, we propose UAG, the first systematic solution to defend adversarial attacks on GNNs through identifying and exploiting hierarchical uncertainties in GNNs. UAG develops a Bayesian Uncertainty Technique (BUT) to explicitly capture uncertainties in GNNs and further employs an Uncertainty-aware Attention Technique (UAT) to defend adversarial attacks on GNNs. Intensive experiments show that our proposed defense approach outperforms the state-of-the-art solutions by a significant margin.

Via

Access Paper or Ask Questions

Scalable Adversarial Attack on Graph Neural Networks with Alternating Direction Method of Multipliers

Sep 22, 2020

Boyuan Feng, Yuke Wang, Xu Li, Yufei Ding

Figure 1 for Scalable Adversarial Attack on Graph Neural Networks with Alternating Direction Method of Multipliers

Figure 2 for Scalable Adversarial Attack on Graph Neural Networks with Alternating Direction Method of Multipliers

Figure 3 for Scalable Adversarial Attack on Graph Neural Networks with Alternating Direction Method of Multipliers

Figure 4 for Scalable Adversarial Attack on Graph Neural Networks with Alternating Direction Method of Multipliers

Abstract:Graph neural networks (GNNs) have achieved high performance in analyzing graph-structured data and have been widely deployed in safety-critical areas, such as finance and autonomous driving. However, only a few works have explored GNNs' robustness to adversarial attacks, and their designs are usually limited by the scale of input datasets (i.e., focusing on small graphs with only thousands of nodes). In this work, we propose, SAG, the first scalable adversarial attack method with Alternating Direction Method of Multipliers (ADMM). We first decouple the large-scale graph into several smaller graph partitions and cast the original problem into several subproblems. Then, we propose to solve these subproblems using projected gradient descent on both the graph topology and the node features that lead to considerably lower memory consumption compared to the conventional attack methods. Rigorous experiments further demonstrate that SAG can significantly reduce the computation and memory overhead compared with the state-of-the-art approach, making SAG applicable towards graphs with large size of nodes and edges.

Via

Access Paper or Ask Questions