Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vijaykrishnan Narayanan

Sigma-Delta Neural Network Conversion on Loihi 2

May 09, 2025

Matthew Brehove, Sadia Anjum Tumpa, Espoir Kyubwa, Naresh Menon, Vijaykrishnan Narayanan

Abstract:Neuromorphic computing aims to improve the efficiency of artificial neural networks by taking inspiration from biological neurons and leveraging temporal sparsity, spatial sparsity, and compute near/in memory. Although these approaches have shown efficiency gains, training these spiking neural networks (SNN) remains difficult. The original attempts at converting trained conventional analog neural networks (ANN) to SNNs used the rate of binary spikes to represent neuron activations. This required many simulation time steps per inference, which degraded efficiency. Intel's Loihi 2 is a neuromorphic platform that supports graded spikes which can be used to represent changes in neuron activation. In this work, we use Loihi 2's graded spikes to develop a method for converting ANN networks to spiking networks, which take advantage of temporal and spatial sparsity. We evaluated the performance of this network on Loihi 2 and compared it to NVIDIA's Jetson Xavier edge AI platform.

Via

Access Paper or Ask Questions

Disharmony: Forensics using Reverse Lighting Harmonization

Jan 17, 2025

Philip Wootaek Shin, Jack Sampson, Vijaykrishnan Narayanan, Andres Marquez, Mahantesh Halappanavar

Figure 1 for Disharmony: Forensics using Reverse Lighting Harmonization

Figure 2 for Disharmony: Forensics using Reverse Lighting Harmonization

Figure 3 for Disharmony: Forensics using Reverse Lighting Harmonization

Figure 4 for Disharmony: Forensics using Reverse Lighting Harmonization

Abstract:Content generation and manipulation approaches based on deep learning methods have seen significant advancements, leading to an increased need for techniques to detect whether an image has been generated or edited. Another area of research focuses on the insertion and harmonization of objects within images. In this study, we explore the potential of using harmonization data in conjunction with a segmentation model to enhance the detection of edited image regions. These edits can be either manually crafted or generated using deep learning methods. Our findings demonstrate that this approach can effectively identify such edits. Existing forensic models often overlook the detection of harmonized objects in relation to the background, but our proposed Disharmony Network addresses this gap. By utilizing an aggregated dataset of harmonization techniques, our model outperforms existing forensic networks in identifying harmonized objects integrated into their backgrounds, and shows potential for detecting various forms of edits, including virtual try-on tasks.

Via

Access Paper or Ask Questions

KALAHash: Knowledge-Anchored Low-Resource Adaptation for Deep Hashing

Dec 27, 2024

Shu Zhao, Tan Yu, Xiaoshuai Hao, Wenchao Ma, Vijaykrishnan Narayanan

Figure 1 for KALAHash: Knowledge-Anchored Low-Resource Adaptation for Deep Hashing

Figure 2 for KALAHash: Knowledge-Anchored Low-Resource Adaptation for Deep Hashing

Figure 3 for KALAHash: Knowledge-Anchored Low-Resource Adaptation for Deep Hashing

Figure 4 for KALAHash: Knowledge-Anchored Low-Resource Adaptation for Deep Hashing

Abstract:Deep hashing has been widely used for large-scale approximate nearest neighbor search due to its storage and search efficiency. However, existing deep hashing methods predominantly rely on abundant training data, leaving the more challenging scenario of low-resource adaptation for deep hashing relatively underexplored. This setting involves adapting pre-trained models to downstream tasks with only an extremely small number of training samples available. Our preliminary benchmarks reveal that current methods suffer significant performance degradation due to the distribution shift caused by limited training samples. To address these challenges, we introduce Class-Calibration LoRA (CLoRA), a novel plug-and-play approach that dynamically constructs low-rank adaptation matrices by leveraging class-level textual knowledge embeddings. CLoRA effectively incorporates prior class knowledge as anchors, enabling parameter-efficient fine-tuning while maintaining the original data distribution. Furthermore, we propose Knowledge-Guided Discrete Optimization (KIDDO), a framework to utilize class knowledge to compensate for the scarcity of visual information and enhance the discriminability of hash codes. Extensive experiments demonstrate that our proposed method, Knowledge- Anchored Low-Resource Adaptation Hashing (KALAHash), significantly boosts retrieval performance and achieves a 4x data efficiency in low-resource scenarios.

* Accepted at AAAI 2025

Via

Access Paper or Ask Questions

PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences

Sep 25, 2024

Pingyi Huo, Anusha Devulapally, Hasan Al Maruf, Minseo Park, Krishnakumar Nair, Meena Arunachalam, Gulsum Gudukbay Akbulut, Mahmut Taylan Kandemir, Vijaykrishnan Narayanan

Figure 1 for PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences

Figure 2 for PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences

Figure 3 for PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences

Figure 4 for PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences

Abstract:Deep Learning Recommendation Models (DLRMs) have become increasingly popular and prevalent in today's datacenters, consuming most of the AI inference cycles. The performance of DLRMs is heavily influenced by available bandwidth due to their large vector sizes in embedding tables and concurrent accesses. To achieve substantial improvements over existing solutions, novel approaches towards DLRM optimization are needed, especially, in the context of emerging interconnect technologies like CXL. This study delves into exploring CXL-enabled systems, implementing a process-in-fabric-switch (PIFS) solution to accelerate DLRMs while optimizing their memory and bandwidth scalability. We present an in-depth characterization of industry-scale DLRM workloads running on CXL-ready systems, identifying the predominant bottlenecks in existing CXL systems. We, therefore, propose PIFS-Rec, a PIFS-based scheme that implements near-data processing through downstream ports of the fabric switch. PIFS-Rec achieves a latency that is 3.89x lower than Pond, an industry-standard CXL-based system, and also outperforms BEACON, a state-of-the-art scheme, by 2.03x.

Via

Access Paper or Ask Questions

Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

Jun 09, 2024

Philip Wootaek Shin, Jihyun Janice Ahn, Wenpeng Yin, Jack Sampson, Vijaykrishnan Narayanan

Figure 1 for Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

Figure 2 for Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

Figure 3 for Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

Figure 4 for Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

Abstract:It has been shown that many generative models inherit and amplify societal biases. To date, there is no uniform/systematic agreed standard to control/adjust for these biases. This study examines the presence and manipulation of societal biases in leading text-to-image models: Stable Diffusion, DALL-E 3, and Adobe Firefly. Through a comprehensive analysis combining base prompts with modifiers and their sequencing, we uncover the nuanced ways these AI technologies encode biases across gender, race, geography, and region/culture. Our findings reveal the challenges and potential of prompt engineering in controlling biases, highlighting the critical need for ethical AI development promoting diversity and inclusivity. This work advances AI ethics by not only revealing the nuanced dynamics of bias in text-to-image generation models but also by offering a novel framework for future research in controlling bias. Our contributions-panning comparative analyses, the strategic use of prompt modifiers, the exploration of prompt sequencing effects, and the introduction of a bias sensitivity taxonomy-lay the groundwork for the development of common metrics and standard analyses for evaluating whether and how future AI models exhibit and respond to requests to adjust for inherent biases.

Via

Access Paper or Ask Questions

Reimagining Sense Amplifiers: Harnessing Phase Transition Materials for Current and Voltage Sensing

Aug 30, 2023

Md Mazharul Islam, Shamiul Alam, Mohammad Adnan Jahangir, Garrett S. Rose, Suman Datta, Vijaykrishnan Narayanan, Sumeet Kumar Gupta, Ahmedullah Aziz

Figure 1 for Reimagining Sense Amplifiers: Harnessing Phase Transition Materials for Current and Voltage Sensing

Figure 2 for Reimagining Sense Amplifiers: Harnessing Phase Transition Materials for Current and Voltage Sensing

Figure 3 for Reimagining Sense Amplifiers: Harnessing Phase Transition Materials for Current and Voltage Sensing

Figure 4 for Reimagining Sense Amplifiers: Harnessing Phase Transition Materials for Current and Voltage Sensing

Abstract:Energy-efficient sense amplifier (SA) circuits are essential for reliable detection of stored memory states in emerging memory systems. In this work, we present four novel sense amplifier (SA) topologies based on phase transition material (PTM) tailored for non-volatile memory applications. We utilize the abrupt switching and volatile hysteretic characteristics of PTMs which enables efficient and fast sensing operation in our proposed SA topologies. We provide comprehensive details of their functionality and assess how process variations impact their performance metrics. Our proposed sense amplifier topologies manifest notable performance enhancement. We achieve a ~67% reduction in sensing delay and a ~80% decrease in sensing power for current sensing. For voltage sensing, we achieve a ~75% reduction in sensing delay and a ~33% decrease in sensing power. Moreover, the proposed SA topologies exhibit improved variation robustness compared to conventional SAs. We also scrutinize the dependence of transistor mirroring window and PTM transition voltages on several device parameters to determine the optimum operating conditions and stance of tunability for each of the proposed SA topologies.

Via

Access Paper or Ask Questions

Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs

Sep 16, 2021

Anup Sarma, Sonali Singh, Huaipan Jiang, Ashutosh Pattnaik, Asit K Mishra, Vijaykrishnan Narayanan, Mahmut T Kandemir, Chita R Das

Figure 1 for Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs

Figure 2 for Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs

Figure 3 for Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs

Figure 4 for Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs

Abstract:Machine/deep-learning (ML/DL) based techniques are emerging as a driving force behind many cutting-edge technologies, achieving high accuracy on computer vision workloads such as image classification and object detection. However, training these models involving large parameters is both time-consuming and energy-hogging. In this regard, several prior works have advocated for sparsity to speed up the of DL training and more so, the inference phase. This work begins with the observation that during training, sparsity in the forward and backward passes are correlated. In that context, we investigate two types of sparsity (input and output type) inherent in gradient descent-based optimization algorithms and propose a hardware micro-architecture to leverage the same. Our experimental results use five state-of-the-art CNN models on the Imagenet dataset, and show back propagation speedups in the range of 1.69$\times$ to 5.43$\times$, compared to the dense baseline execution. By exploiting sparsity in both the forward and backward passes, speedup improvements range from 1.68$\times$ to 3.30$\times$ over the sparsity-agnostic baseline execution. Our work also achieves significant reduction in training iteration time over several previously proposed dense as well as sparse accelerator based platforms, in addition to achieving order of magnitude energy efficiency improvements over GPU based execution.

Via

Access Paper or Ask Questions

Transformer-based Machine Learning for Fast SAT Solvers and Logic Synthesis

Jul 15, 2021

Feng Shi, Chonghan Lee, Mohammad Khairul Bashar, Nikhil Shukla, Song-Chun Zhu, Vijaykrishnan Narayanan

Figure 1 for Transformer-based Machine Learning for Fast SAT Solvers and Logic Synthesis

Figure 2 for Transformer-based Machine Learning for Fast SAT Solvers and Logic Synthesis

Figure 3 for Transformer-based Machine Learning for Fast SAT Solvers and Logic Synthesis

Figure 4 for Transformer-based Machine Learning for Fast SAT Solvers and Logic Synthesis

Abstract:CNF-based SAT and MaxSAT solvers are central to logic synthesis and verification systems. The increasing popularity of these constraint problems in electronic design automation encourages studies on different SAT problems and their properties for further computational efficiency. There has been both theoretical and practical success of modern Conflict-driven clause learning SAT solvers, which allows solving very large industrial instances in a relatively short amount of time. Recently, machine learning approaches provide a new dimension to solving this challenging problem. Neural symbolic models could serve as generic solvers that can be specialized for specific domains based on data without any changes to the structure of the model. In this work, we propose a one-shot model derived from the Transformer architecture to solve the MaxSAT problem, which is the optimization version of SAT where the goal is to satisfy the maximum number of clauses. Our model has a scale-free structure which could process varying size of instances. We use meta-path and self-attention mechanism to capture interactions among homogeneous nodes. We adopt cross-attention mechanisms on the bipartite graph to capture interactions among heterogeneous nodes. We further apply an iterative algorithm to our model to satisfy additional clauses, enabling a solution approaching that of an exact-SAT problem. The attention mechanisms leverage the parallelism for speedup. Our evaluation indicates improved speedup compared to heuristic approaches and improved completion rate compared to machine learning approaches.

Via

Access Paper or Ask Questions

STAR: Sparse Transformer-based Action Recognition

Jul 15, 2021

Feng Shi, Chonghan Lee, Liang Qiu, Yizhou Zhao, Tianyi Shen, Shivran Muralidhar, Tian Han, Song-Chun Zhu, Vijaykrishnan Narayanan

Figure 1 for STAR: Sparse Transformer-based Action Recognition

Figure 2 for STAR: Sparse Transformer-based Action Recognition

Figure 3 for STAR: Sparse Transformer-based Action Recognition

Figure 4 for STAR: Sparse Transformer-based Action Recognition

Abstract:The cognitive system for human action and behavior has evolved into a deep learning regime, and especially the advent of Graph Convolution Networks has transformed the field in recent years. However, previous works have mainly focused on over-parameterized and complex models based on dense graph convolution networks, resulting in low efficiency in training and inference. Meanwhile, the Transformer architecture-based model has not yet been well explored for cognitive application in human action and behavior estimation. This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data. Our model can also process the variable length of video clips grouped as a single batch. Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference. Experiments show that our model achieves 4~18x speedup and 1/7~1/15 model size compared with the baseline models at competitive accuracy.

Via

Access Paper or Ask Questions

Communication-efficient k-Means for Edge-based Machine Learning

Feb 08, 2021

Hanlin Lu, Ting He, Shiqiang Wang, Changchang Liu, Mehrdad Mahdavi, Vijaykrishnan Narayanan, Kevin S. Chan, Stephen Pasteris

Figure 1 for Communication-efficient k-Means for Edge-based Machine Learning

Figure 2 for Communication-efficient k-Means for Edge-based Machine Learning

Figure 3 for Communication-efficient k-Means for Edge-based Machine Learning

Figure 4 for Communication-efficient k-Means for Edge-based Machine Learning

Abstract:We consider the problem of computing the k-means centers for a large high-dimensional dataset in the context of edge-based machine learning, where data sources offload machine learning computation to nearby edge servers. k-Means computation is fundamental to many data analytics, and the capability of computing provably accurate k-means centers by leveraging the computation power of the edge servers, at a low communication and computation cost to the data sources, will greatly improve the performance of these analytics. We propose to let the data sources send small summaries, generated by joint dimensionality reduction (DR) and cardinality reduction (CR), to support approximate k-means computation at reduced complexity and communication cost. By analyzing the complexity, the communication cost, and the approximation error of k-means algorithms based on state-of-the-art DR/CR methods, we show that: (i) it is possible to achieve a near-optimal approximation at a near-linear complexity and a constant or logarithmic communication cost, (ii) the order of applying DR and CR significantly affects the complexity and the communication cost, and (iii) combining DR/CR methods with a properly configured quantizer can further reduce the communication cost without compromising the other performance metrics. Our findings are validated through experiments based on real datasets.

Via

Access Paper or Ask Questions