Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaojun Zhai

Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

Jul 02, 2024

Xuqi Zhu, Huaizhi Zhang, JunKyu Lee, Jiacheng Zhu, Chandrajit Pal, Sangeet Saha, Klaus D. McDonald-Maier, Xiaojun Zhai

Figure 1 for Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

Figure 2 for Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

Figure 3 for Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

Figure 4 for Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

Abstract:Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations, constituting the predominant computational cost. Therefore, this paper proposes a high-throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs as a basic component of the NNs. We firstly streamline inter-layer and intra-layer redundancies of MADDNESS algorithm, a LUT-based approximate matrix multiplication, to design a fast, efficient scalable approximate matrix multiplication module termed "Approximate Multiplication Unit (AMU)". The AMU optimizes LUT-based matrix multiplications further through dedicated memory management and access design, decoupling computational overhead from input resolution and boosting FPGA-based NN accelerator efficiency significantly. The experimental results show that using our AMU achieves up to 9x higher throughput and 112x higher energy efficiency over the state-of-the-art solutions for the FPGA-based Quantised Neural Network (QNN) accelerators.

Via

Access Paper or Ask Questions

An Evaluation and Ranking of Different Voting Schemes for Improved Visual Place Recognition

May 09, 2023

Maria Waheed, Michael Milford, Xiaojun Zhai, Klaus McDonald-Maier, Shoaib Ehsan

Abstract:Visual Place Recognition has recently seen a surge of endeavours utilizing different ensemble approaches to improve VPR performance. Ideas like multi-process fusion or switching involve combining different VPR techniques together, utilizing different strategies. One major aspect often common to many of these strategies is voting. Voting is widely used in many ensemble methods, so it is potentially a relevant subject to explore in terms of its application and significance for improving VPR performance. This paper attempts to looks into detail and analyze a variety of voting schemes to evaluate which voting technique is optimal for an ensemble VPR set up. We take inspiration from a variety of voting schemes that exist and are widely employed in other research fields such as politics and sociology. The idea is inspired by an observation that different voting methods result in different outcomes for the same type of data and each voting scheme is utilized for specific cases in different academic fields. Some of these voting schemes include Condorcet voting, Broda Count and Plurality voting. Voting employed in any aspect requires that a fair system be established, that outputs the best and most favourable results which in our case would involve improving VPR performance. We evaluate some of these voting techniques in a standardized testing of different VPR techniques, using a variety of VPR data sets. We aim to determine whether a single optimal voting scheme exists or, much like in other fields of research, the selection of a voting technique is relative to its application and environment. We also aim to propose a ranking of these different voting methods from best to worst according to our results as this will allow for better selection of voting schemes.

Via

Access Paper or Ask Questions

Using Machine Learning for Anomaly Detection on a System-on-Chip under Gamma Radiation

Jan 05, 2022

Eduardo Weber Wachter, Server Kasap, Sefki Kolozali, Xiaojun Zhai, Shoaib Ehsan, Klaus McDonald-Maier

Figure 1 for Using Machine Learning for Anomaly Detection on a System-on-Chip under Gamma Radiation

Figure 2 for Using Machine Learning for Anomaly Detection on a System-on-Chip under Gamma Radiation

Figure 3 for Using Machine Learning for Anomaly Detection on a System-on-Chip under Gamma Radiation

Figure 4 for Using Machine Learning for Anomaly Detection on a System-on-Chip under Gamma Radiation

Abstract:The emergence of new nanoscale technologies has imposed significant challenges to designing reliable electronic systems in radiation environments. A few types of radiation like Total Ionizing Dose (TID) effects often cause permanent damages on such nanoscale electronic devices, and current state-of-the-art technologies to tackle TID make use of expensive radiation-hardened devices. This paper focuses on a novel and different approach: using machine learning algorithms on consumer electronic level Field Programmable Gate Arrays (FPGAs) to tackle TID effects and monitor them to replace before they stop working. This condition has a research challenge to anticipate when the board results in a total failure due to TID effects. We observed internal measurements of the FPGA boards under gamma radiation and used three different anomaly detection machine learning (ML) algorithms to detect anomalies in the sensor measurements in a gamma-radiated environment. The statistical results show a highly significant relationship between the gamma radiation exposure levels and the board measurements. Moreover, our anomaly detection results have shown that a One-Class Support Vector Machine with Radial Basis Function Kernel has an average Recall score of 0.95. Also, all anomalies can be detected before the boards stop working.

Via

Access Paper or Ask Questions

HEMELB Acceleration and Visualization for Cerebral Aneurysms

Jun 27, 2019

Sahar Soheilian Esfahani, Xiaojun Zhai, Minsi Chen, Abbes Amira, Faycal Bensaali, Julien AbiNahed, Sarada Dakua, Georges Younes, Robin A. Richardson, Peter V. Coveney

Figure 1 for HEMELB Acceleration and Visualization for Cerebral Aneurysms

Figure 2 for HEMELB Acceleration and Visualization for Cerebral Aneurysms

Figure 3 for HEMELB Acceleration and Visualization for Cerebral Aneurysms

Figure 4 for HEMELB Acceleration and Visualization for Cerebral Aneurysms

Abstract:A weakness in the wall of a cerebral artery causing a dilation or ballooning of the blood vessel is known as a cerebral aneurysm. Optimal treatment requires fast and accurate diagnosis of the aneurysm. HemeLB is a fluid dynamics solver for complex geometries developed to provide neurosurgeons with information related to the flow of blood in and around aneurysms. On a cost efficient platform, HemeLB could be employed in hospitals to provide surgeons with the simulation results in real-time. In this work, we developed an improved version of HemeLB for GPU implementation and result visualization. A visualization platform for smooth interaction with end users is also presented. Finally, a comprehensive evaluation of this implementation is reported. The results demonstrate that the proposed implementation achieves a maximum performance of 15,168,964 site updates per second, and is capable of speeding up HemeLB for deployment in hospitals and clinical investigations.

Via

Access Paper or Ask Questions