Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Youhui Zhang

Pipelining Kruskal's: A Neuromorphic Approach for Minimum Spanning Tree

May 19, 2025

Yee Hin Chong, Peng Qu, Yuchen Li, Youhui Zhang

Abstract:Neuromorphic computing, characterized by its event-driven computation and massive parallelism, is particularly effective for handling data-intensive tasks in low-power environments, such as computing the minimum spanning tree (MST) for large-scale graphs. The introduction of dynamic synaptic modifications provides new design opportunities for neuromorphic algorithms. Building on this foundation, we propose an SNN-based union-sort routine and a pipelined version of Kruskal's algorithm for MST computation. The event-driven nature of our method allows for the concurrent execution of two completely decoupled stages: neuromorphic sorting and union-find. Our approach demonstrates superior performance compared to state-of-the-art Prim 's-based methods on large-scale graphs from the DIMACS10 dataset, achieving speedups by 269.67x to 1283.80x, with a median speedup of 540.76x. We further evaluate the pipelined implementation against two serial variants of Kruskal's algorithm, which rely on neuromorphic sorting and neuromorphic radix sort, showing significant performance advantages in most scenarios.

Via

Access Paper or Ask Questions

AIPerf: Automated machine learning as an AI-HPC benchmark

Aug 26, 2020

Zhixiang Ren, Yongheng Liu, Tianhui Shi, Lei Xie, Yue Zhou, Jidong Zhai, Youhui Zhang, Yunquan Zhang, Wenguang Chen

Figure 1 for AIPerf: Automated machine learning as an AI-HPC benchmark

Figure 2 for AIPerf: Automated machine learning as an AI-HPC benchmark

Figure 3 for AIPerf: Automated machine learning as an AI-HPC benchmark

Figure 4 for AIPerf: Automated machine learning as an AI-HPC benchmark

Abstract:The plethora of complex artificial intelligence (AI) algorithms and available high performance computing (HPC) power stimulates the convergence of AI and HPC. The expeditious development of AI components, in both hardware and software domain, increases the system heterogeneity, which prompts the challenge on fair and comprehensive benchmarking. Existing HPC and AI benchmarks fail to cover the variety of heterogeneous systems while providing a simple quantitative measurement to reflect the overall performance of large clusters for AI tasks. To address the challenges, we specify the requirements of an AI-HPC considering the future scenarios and propose an end-to-end benchmark suite utilizing automated machine learning (AutoML) as a representative AI application. The extremely high computational cost and high scalability make AutoML a desired workload candidate for AI-HPC benchmark. We implement the algorithms in a highly efficient and parallel way to ensure automatic adaption on various systems regarding AI accelerator's memory and quantity. The benchmark is particularly customizable on back-end training framework and hyperparameters so as to achieve optimal performance on diverse systems. The major metric to quantify the machine performance is floating-point operations per second (FLOPS), which is measured in a systematic and analytical approach. We also provide a regulated score as a complementary result to reflect hardware and software co-performance. We verify the benchmark's linear scalability on different scales of nodes up to 16 equipped with 128 GPUs and evaluate the stability as well as reproducibility at discrete timestamps. The source code, specifications, and detailed procedures are publicly accessible on GitHub: https://github.com/AI-HPC-Research-Team/AIPerf.

Via

Access Paper or Ask Questions

Brain-inspired global-local hybrid learning towards human-like intelligence

Jun 05, 2020

Yujie Wu, Rong Zhao, Jun Zhu, Feng Chen, Mingkun Xu, Guoqi Li, Sen Song, Lei Deng, Guanrui Wang, Hao Zheng(+4 more)

Figure 1 for Brain-inspired global-local hybrid learning towards human-like intelligence

Figure 2 for Brain-inspired global-local hybrid learning towards human-like intelligence

Figure 3 for Brain-inspired global-local hybrid learning towards human-like intelligence

Figure 4 for Brain-inspired global-local hybrid learning towards human-like intelligence

Abstract:The combination of neuroscience-oriented and computer-science-oriented approaches is the most promising method to develop artificial general intelligence (AGI) that can learn general tasks similar to humans. Currently, two main routes of learning exist, including neuroscience-inspired methods, represented by local synaptic plasticity, and machine-learning methods, represented by backpropagation. Both have advantages and complement each other, but neither can solve all learning problems well. Integrating these two methods into one network may provide better learning abilities for general tasks. Here, we report a hybrid spiking neural network model that integrates the two approaches by introducing a meta-local module and a two-phase causality modelling method. The model can not only optimize local plasticity rules, but also receive top-down supervision information. In addition to flexibly supporting multiple spike-based coding schemes, we demonstrate that this model facilitates learning of many general tasks, including fault-tolerance learning, few-shot learning and multiple-task learning, and show its efficiency on the Tianjic neuromorphic platform. This work provides a new route for brain-inspired computing and facilitates AGI development.

* 5 figures, 2 tables

Via

Access Paper or Ask Questions

FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture

Jan 28, 2019

Yu Ji, Youyang Zhang, Xinfeng Xie, Shuangchen Li, Peiqi Wang, Xing Hu, Youhui Zhang, Yuan Xie

Figure 1 for FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture

Figure 2 for FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture

Figure 3 for FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture

Figure 4 for FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture

Abstract:Neural Network (NN) accelerators with emerging ReRAM (resistive random access memory) technologies have been investigated as one of the promising solutions to address the \textit{memory wall} challenge, due to the unique capability of \textit{processing-in-memory} within ReRAM-crossbar-based processing elements (PEs). However, the high efficiency and high density advantages of ReRAM have not been fully utilized due to the huge communication demands among PEs and the overhead of peripheral circuits. In this paper, we propose a full system stack solution, composed of a reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and its software system including neural synthesizer, temporal-to-spatial mapper, and placement & routing. We highly leverage the software system to make the hardware design compact and efficient. To satisfy the high-performance communication demand, we optimize it with a reconfigurable routing architecture and the placement & routing tool. To improve the computational density, we greatly simplify the PE circuit with the spiking schema and then adopt neural synthesizer to enable the high density computation-resources to support different kinds of NN operations. In addition, we provide spiking memory blocks (SMBs) and configurable logic blocks (CLBs) in hardware and leverage the temporal-to-spatial mapper to utilize them to balance the storage and computation requirements of NN. Owing to the end-to-end software system, we can efficiently deploy existing deep neural networks to FPSA. Evaluations show that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME, the computational density of FPSA improves by 31x; for representative NNs, its inference performance can achieve up to 1000x speedup.

* Accepted by ASPLOS 2019

Via

Access Paper or Ask Questions

Programmable Neural Network Trojan for Pre-Trained Feature Extractor

Jan 23, 2019

Yu Ji, Zixin Liu, Xing Hu, Peiqi Wang, Youhui Zhang

Figure 1 for Programmable Neural Network Trojan for Pre-Trained Feature Extractor

Figure 2 for Programmable Neural Network Trojan for Pre-Trained Feature Extractor

Figure 3 for Programmable Neural Network Trojan for Pre-Trained Feature Extractor

Figure 4 for Programmable Neural Network Trojan for Pre-Trained Feature Extractor

Abstract:Neural network (NN) trojaning attack is an emerging and important attack model that can broadly damage the system deployed with NN models. Existing studies have explored the outsourced training attack scenario and transfer learning attack scenario in some small datasets for specific domains, with limited numbers of fixed target classes. In this paper, we propose a more powerful trojaning attack method for both outsourced training attack and transfer learning attack, which outperforms existing studies in the capability, generality, and stealthiness. First, The attack is programmable that the malicious misclassification target is not fixed and can be generated on demand even after the victim's deployment. Second, our trojan attack is not limited in a small domain; one trojaned model on a large-scale dataset can affect applications of different domains that reuse its general features. Thirdly, our trojan design is hard to be detected or eliminated even if the victims fine-tune the whole model.

Via

Access Paper or Ask Questions