Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Keshab K. Parhi

A Survey of Attacks on Large Language Models

May 18, 2025

Wenrui Xu, Keshab K. Parhi

Abstract:Large language models (LLMs) and LLM-based agents have been widely deployed in a wide range of applications in the real world, including healthcare diagnostics, financial analysis, customer support, robotics, and autonomous driving, expanding their powerful capability of understanding, reasoning, and generating natural languages. However, the wide deployment of LLM-based applications exposes critical security and reliability risks, such as the potential for malicious misuse, privacy leakage, and service disruption that weaken user trust and undermine societal safety. This paper provides a systematic overview of the details of adversarial attacks targeting both LLMs and LLM-based agents. These attacks are organized into three phases in LLMs: Training-Phase Attacks, Inference-Phase Attacks, and Availability & Integrity Attacks. For each phase, we analyze the details of representative and recently introduced attack methods along with their corresponding defenses. We hope our survey will provide a good tutorial and a comprehensive understanding of LLM security, especially for attacks on LLMs. We desire to raise attention to the risks inherent in widely deployed LLM-based applications and highlight the urgent need for robust mitigation strategies for evolving threats.

Via

Access Paper or Ask Questions

SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling

Jun 11, 2024

Sai Sanjeet, Bibhu Datta Sahoo, Keshab K. Parhi

Figure 1 for SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling

Figure 2 for SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling

Figure 3 for SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling

Figure 4 for SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling

Abstract:Spiking Neural Networks (SNNs) have gained popularity due to their high energy efficiency. Prior works have proposed various methods for training SNNs, including backpropagation-based methods. Training SNNs is computationally expensive compared to their conventional counterparts and would benefit from multiprocessor hardware acceleration. This is the first paper to propose inter-layer pipelining to accelerate training in SNNs using systolic array-based processors and multiprocessor scheduling. The impact of training using delayed gradients is observed using three networks training on different datasets, showing no degradation for small networks and < 10% degradation for large networks. The mapping of various training tasks of the SNN onto systolic arrays is formulated, and the proposed scheduling method is evaluated on the three networks. The results are compared against standard pipelining algorithms. The results show that the proposed method achieves an average speedup of 1.6X compared to standard pipelining algorithms, with an upwards of 2X improvement in some cases. The incurred communication overhead due to the proposed method is less than 0.5% of the total required communication of training.

Via

Access Paper or Ask Questions

Robust Clustering using Hyperdimensional Computing

Dec 05, 2023

Lulu Ge, Keshab K. Parhi

Abstract:This paper addresses the clustering of data in the hyperdimensional computing (HDC) domain. In prior work, an HDC-based clustering framework, referred to as HDCluster, has been proposed. However, the performance of the existing HDCluster is not robust. The performance of HDCluster is degraded as the hypervectors for the clusters are chosen at random during the initialization step. To overcome this bottleneck, we assign the initial cluster hypervectors by exploring the similarity of the encoded data, referred to as \textit{query} hypervectors. Intra-cluster hypervectors have a higher similarity than inter-cluster hypervectors. Harnessing the similarity results among query hypervectors, this paper proposes four HDC-based clustering algorithms: similarity-based k-means, equal bin-width histogram, equal bin-height histogram, and similarity-based affinity propagation. Experimental results illustrate that: (i) Compared to the existing HDCluster, our proposed HDC-based clustering algorithms can achieve better accuracy, more robust performance, fewer iterations, and less execution time. Similarity-based affinity propagation outperforms the other three HDC-based clustering algorithms on eight datasets by 2~38% in clustering accuracy. (ii) Even for one-pass clustering, i.e., without any iterative update of the cluster hypervectors, our proposed algorithms can provide more robust clustering accuracy than HDCluster. (iii) Over eight datasets, five out of eight can achieve higher or comparable accuracy when projected onto the hyperdimensional space. Traditional clustering is more desirable than HDC when the number of clusters, $k$, is large.

Via

Access Paper or Ask Questions

Systematic Design and Optimization of Quantum Circuits for Stabilizer Codes

Sep 21, 2023

Arijit Mondal, Keshab K. Parhi

Figure 1 for Systematic Design and Optimization of Quantum Circuits for Stabilizer Codes

Figure 2 for Systematic Design and Optimization of Quantum Circuits for Stabilizer Codes

Figure 3 for Systematic Design and Optimization of Quantum Circuits for Stabilizer Codes

Figure 4 for Systematic Design and Optimization of Quantum Circuits for Stabilizer Codes

Abstract:Quantum computing is an emerging technology that has the potential to achieve exponential speedups over their classical counterparts. To achieve quantum advantage, quantum principles are being applied to fields such as communications, information processing, and artificial intelligence. However, quantum computers face a fundamental issue since quantum bits are extremely noisy and prone to decoherence. Keeping qubits error free is one of the most important steps towards reliable quantum computing. Different stabilizer codes for quantum error correction have been proposed in past decades and several methods have been proposed to import classical error correcting codes to the quantum domain. However, formal approaches towards the design and optimization of circuits for these quantum encoders and decoders have so far not been proposed. In this paper, we propose a formal algorithm for systematic construction of encoding circuits for general stabilizer codes. This algorithm is used to design encoding and decoding circuits for an eight-qubit code. Next, we propose a systematic method for the optimization of the encoder circuit thus designed. Using the proposed method, we optimize the encoding circuit in terms of the number of 2-qubit gates used. The proposed optimized eight-qubit encoder uses 18 CNOT gates and 4 Hadamard gates, as compared to 14 single qubit gates, 33 2-qubit gates, and 6 CCNOT gates in a prior work. The encoder and decoder circuits are verified using IBM Qiskit. We also present optimized encoder circuits for Steane code and a 13-qubit code in terms of the number of gates used.

* arXiv admin note: substantial text overlap with arXiv:2309.11793

Via

Access Paper or Ask Questions

Quantum Circuits for Stabilizer Error Correcting Codes: A Tutorial

Sep 21, 2023

Arijit Mondal, Keshab K. Parhi

Abstract:Quantum computers have the potential to provide exponential speedups over their classical counterparts. Quantum principles are being applied to fields such as communications, information processing, and artificial intelligence to achieve quantum advantage. However, quantum bits are extremely noisy and prone to decoherence. Thus, keeping the qubits error free is extremely important toward reliable quantum computing. Quantum error correcting codes have been studied for several decades and methods have been proposed to import classical error correcting codes to the quantum domain. However, circuits for such encoders and decoders haven't been explored in depth. This paper serves as a tutorial on designing and simulating quantum encoder and decoder circuits for stabilizer codes. We present encoding and decoding circuits for five-qubit code and Steane code, along with verification of these circuits using IBM Qiskit. We also provide nearest neighbour compliant encoder and decoder circuits for the five-qubit code.

Via

Access Paper or Ask Questions

A Low-Latency FFT-IFFT Cascade Architecture

Sep 16, 2023

Keshab K. Parhi

Abstract:This paper addresses the design of a partly-parallel cascaded FFT-IFFT architecture that does not require any intermediate buffer. Folding can be used to design partly-parallel architectures for FFT and IFFT. While many cascaded FFT-IFFT architectures can be designed using various folding sets for the FFT and the IFFT, for a specified folded FFT architecture, there exists a unique folding set to design the IFFT architecture that does not require an intermediate buffer. Such a folding set is designed by processing the output of the FFT as soon as possible (ASAP) in the folded IFFT. Elimination of the intermediate buffer reduces latency and saves area. The proposed approach is also extended to interleaved processing of multi-channel time-series. The proposed FFT-IFFT cascade architecture saves about N/2 memory elements and N/4 clock cycles of latency compared to a design with identical folding sets. For the 2-interleaved FFT-IFFT cascade, the memory and latency savings are, respectively, N/2 units and N/2 clock cycles, compared to a design with identical folding sets.

Via

Access Paper or Ask Questions

NTT-Based Polynomial Modular Multiplication for Homomorphic Encryption: A Tutorial

Jun 21, 2023

Sin-Wei Chiu, Keshab K. Parhi

Abstract:Homomorphic Encryption (HE) allows any third party to operate on the encrypted data without decrypting it in advance. For the majority of HE schemes, the multiplicative depth of circuits is the main practical limitation in performing computations over encrypted data. Hence, Homomorphic multiplication is one of the most important components of homomorphic encryption. Since most of the HE schemes are constructed from the ring-learning with errors (R-LWE) problem. Efficient polynomial modular multiplication implementation becomes critical. This work consists of describing various approaches to implementing polynomial modular multiplication based on number theoretic transform.

* 6 pages

Via

Access Paper or Ask Questions

Tensor Decomposition for Model Reduction in Neural Networks: A Review

Apr 26, 2023

Xingyi Liu, Keshab K. Parhi

Abstract:Modern neural networks have revolutionized the fields of computer vision (CV) and Natural Language Processing (NLP). They are widely used for solving complex CV tasks and NLP tasks such as image classification, image generation, and machine translation. Most state-of-the-art neural networks are over-parameterized and require a high computational cost. One straightforward solution is to replace the layers of the networks with their low-rank tensor approximations using different tensor decomposition methods. This paper reviews six tensor decomposition methods and illustrates their ability to compress model parameters of convolutional neural networks (CNNs), recurrent neural networks (RNNs) and Transformers. The accuracy of some compressed models can be higher than the original versions. Evaluations indicate that tensor decompositions can achieve significant reductions in model size, run-time and energy consumption, and are well suited for implementing neural networks on edge devices.

* IEEE Circuits and Systems Magazine, 2023

Via

Access Paper or Ask Questions

Multi-Channel FFT Architectures Designed via Folding and Interleaving

Feb 19, 2022

Nanda K. Unnikrishnan, Keshab K. Parhi

Figure 1 for Multi-Channel FFT Architectures Designed via Folding and Interleaving

Figure 2 for Multi-Channel FFT Architectures Designed via Folding and Interleaving

Figure 3 for Multi-Channel FFT Architectures Designed via Folding and Interleaving

Figure 4 for Multi-Channel FFT Architectures Designed via Folding and Interleaving

Abstract:Computing the FFT of a single channel is well understood in the literature. However, computing the FFT of multiple channels in a systematic manner has not been fully addressed. This paper presents a framework to design a family of multi-channel FFT architectures using {\em folding} and {\em interleaving}. Three distinct multi-channel FFT architectures are presented in this paper. These architectures differ in the input and output preprocessing steps and are based on different folding sets, i.e., different orders of execution.

* Proc. 2022 IEEE International Symposium on Circuits and Systems (ISCAS)

Via

Access Paper or Ask Questions

LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling

Aug 14, 2021

Nanda K. Unnikrishnan, Keshab K. Parhi

Figure 1 for LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling

Figure 2 for LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling

Figure 3 for LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling

Figure 4 for LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling

Abstract:The time required for training the neural networks increases with size, complexity, and depth. Training model parameters by backpropagation inherently creates feedback loops. These loops hinder efficient pipelining and scheduling of the tasks within the layer and between consecutive layers. Prior approaches, such as PipeDream, have exploited the use of delayed gradient to achieve inter-layer pipelining. However, these approaches treat the entire backpropagation as a single task; this leads to an increase in computation time and processor underutilization. This paper presents novel optimization approaches where the gradient computations with respect to the weights and the activation functions are considered independently; therefore, these can be computed in parallel. This is referred to as intra-layer optimization. Additionally, the gradient computation with respect to the activation function is further divided into two parts and distributed to two consecutive layers. This leads to balanced scheduling where the computation time of each layer is the same. This is referred to as inter-layer optimization. The proposed system, referred to as LayerPipe, reduces the number of clock cycles required for training while maximizing processor utilization with minimal inter-processor communication overhead. LayerPipe achieves an average speedup of 25% and upwards of 80% with 7 to 9 processors with less communication overhead when compared to PipeDream.

* Proc. of the 2021 IEEE International Conference on Computer Aided Design (ICCAD)

Via

Access Paper or Ask Questions