Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vikas Singh

NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance Prediction

Sep 26, 2025

Shayne Wadle, Yanxin Zhang, Vikas Singh, Karthikeyan Sankaralingam

Figure 1 for NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance Prediction

Figure 2 for NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance Prediction

Figure 3 for NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance Prediction

Figure 4 for NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance Prediction

Abstract:The evaluation of new microprocessor designs is constrained by slow, cycle-accurate simulators that rely on unrepresentative benchmark traces. This paper introduces a novel deep learning framework for high-fidelity, ``in-the-wild'' simulation on production hardware. Our core contribution is a DL model trained on microarchitecture-independent features to predict cycle-level performance for hypothetical processor designs. This unique approach allows the model to be deployed on existing silicon to evaluate future hardware. We propose a complete system featuring a lightweight hardware trace collector and a principled sampling strategy to minimize user impact. This system achieves a simulation speed of 5 MIPS on a commodity GPU, imposing a mere 0.1% performance overhead. Furthermore, our co-designed Neutrino on-chip accelerator improves performance by 85x over the GPU. We demonstrate that this framework enables accurate performance analysis and large-scale hardware A/B testing on a massive scale using real-world applications.

Via

Access Paper or Ask Questions

Composing Linear Layers from Irreducibles

Jul 15, 2025

Travis Pence, Daisuke Yamada, Vikas Singh

Abstract:Contemporary large models often exhibit behaviors suggesting the presence of low-level primitives that compose into modules with richer functionality, but these fundamental building blocks remain poorly understood. We investigate this compositional structure in linear layers by asking: can we identify/synthesize linear transformations from a minimal set of geometric primitives? Using Clifford algebra, we show that linear layers can be expressed as compositions of bivectors -- geometric objects encoding oriented planes -- and introduce a differentiable algorithm that decomposes them into products of rotors. This construction uses only O(log^2 d) parameters, versus O(d^2) required by dense matrices. Applied to the key, query, and value projections in LLM attention layers, our rotor-based layers match the performance of strong baselines such as block-Hadamard and low-rank approximations. Our findings provide an algebraic perspective on how these geometric primitives can compose into higher-level functions within deep models.

* 27 Pages, 13 Tables, 8 Figures

Via

Access Paper or Ask Questions

Brain-wide interpolation and conditioning of gene expression in the human brain using Implicit Neural Representations

Jun 11, 2025

Xizheng Yu, Justin Torok, Sneha Pandya, Sourav Pal, Vikas Singh, Ashish Raj

Abstract:In this paper, we study the efficacy and utility of recent advances in non-local, non-linear image interpolation and extrapolation algorithms, specifically, ideas based on Implicit Neural Representations (INR), as a tool for analysis of spatial transcriptomics data. We seek to utilize the microarray gene expression data sparsely sampled in the healthy human brain, and produce fully resolved spatial maps of any given gene across the whole brain at a voxel-level resolution. To do so, we first obtained the 100 top AD risk genes, whose baseline spatial transcriptional profiles were obtained from the Allen Human Brain Atlas (AHBA). We adapted Implicit Neural Representation models so that the pipeline can produce robust voxel-resolution quantitative maps of all genes. We present a variety of experiments using interpolations obtained from Abagen as a baseline/reference.

Via

Access Paper or Ask Questions

Real-Time Brain Tumor Detection in Intraoperative Ultrasound Using YOLO11: From Model Training to Deployment in the Operating Room

Jan 27, 2025

Santiago Cepeda, Olga Esteban-Sinovas, Roberto Romero, Vikas Singh, Prakash Shetty, Aliasgar Moiyadi, Ilyess Zemmoura, Giuseppe Roberto Giammalva, Massimiliano Del Bene, Arianna Barbotti(+6 more)

Figure 1 for Real-Time Brain Tumor Detection in Intraoperative Ultrasound Using YOLO11: From Model Training to Deployment in the Operating Room

Figure 2 for Real-Time Brain Tumor Detection in Intraoperative Ultrasound Using YOLO11: From Model Training to Deployment in the Operating Room

Figure 3 for Real-Time Brain Tumor Detection in Intraoperative Ultrasound Using YOLO11: From Model Training to Deployment in the Operating Room

Figure 4 for Real-Time Brain Tumor Detection in Intraoperative Ultrasound Using YOLO11: From Model Training to Deployment in the Operating Room

Abstract:Intraoperative ultrasound (ioUS) is a valuable tool in brain tumor surgery due to its versatility, affordability, and seamless integration into the surgical workflow. However, its adoption remains limited, primarily because of the challenges associated with image interpretation and the steep learning curve required for effective use. This study aimed to enhance the interpretability of ioUS images by developing a real-time brain tumor detection system deployable in the operating room. We collected 2D ioUS images from the Brain Tumor Intraoperative Database (BraTioUS) and the public ReMIND dataset, annotated with expert-refined tumor labels. Using the YOLO11 architecture and its variants, we trained object detection models to identify brain tumors. The dataset included 1,732 images from 192 patients, divided into training, validation, and test sets. Data augmentation expanded the training set to 11,570 images. In the test dataset, YOLO11s achieved the best balance of precision and computational efficiency, with a mAP@50 of 0.95, mAP@50-95 of 0.65, and a processing speed of 34.16 frames per second. The proposed solution was prospectively validated in a cohort of 15 consecutively operated patients diagnosed with brain tumors. Neurosurgeons confirmed its seamless integration into the surgical workflow, with real-time predictions accurately delineating tumor regions. These findings highlight the potential of real-time object detection algorithms to enhance ioUS-guided brain tumor surgery, addressing key challenges in interpretation and providing a foundation for future development of computer vision-based tools for neuro-oncological surgery.

Via

Access Paper or Ask Questions

Hotel Booking Cancellation Prediction Using Applied Bayesian Models

Oct 23, 2024

Md Asifuzzaman Jishan, Vikas Singh, Ayan Kumar Ghosh, Md Shahabub Alam, Khan Raqib Mahmud, Bijan Paul

Abstract:This study applies Bayesian models to predict hotel booking cancellations, a key challenge affecting resource allocation, revenue, and customer satisfaction in the hospitality industry. Using a Kaggle dataset with 36,285 observations and 17 features, Bayesian Logistic Regression and Beta-Binomial models were implemented. The logistic model, applied to 12 features and 5,000 randomly selected observations, outperformed the Beta-Binomial model in predictive accuracy. Key predictors included the number of adults, children, stay duration, lead time, car parking space, room type, and special requests. Model evaluation using Leave-One-Out Cross-Validation (LOO-CV) confirmed strong alignment between observed and predicted outcomes, demonstrating the model's robustness. Special requests and parking availability were found to be the strongest predictors of cancellation. This Bayesian approach provides a valuable tool for improving booking management and operational efficiency in the hotel industry.

Via

Access Paper or Ask Questions

Variational Sampling of Temporal Trajectories

Mar 18, 2024

Jurijs Nazarovs, Zhichun Huang, Xingjian Zhen, Sourav Pal, Rudrasis Chakraborty, Vikas Singh

Abstract:A deterministic temporal process can be determined by its trajectory, an element in the product space of (a) initial condition $z_0 \in \mathcal{Z}$ and (b) transition function $f: (\mathcal{Z}, \mathcal{T}) \to \mathcal{Z}$ often influenced by the control of the underlying dynamical system. Existing methods often model the transition function as a differential equation or as a recurrent neural network. Despite their effectiveness in predicting future measurements, few results have successfully established a method for sampling and statistical inference of trajectories using neural networks, partially due to constraints in the parameterization. In this work, we introduce a mechanism to learn the distribution of trajectories by parameterizing the transition function $f$ explicitly as an element in a function space. Our framework allows efficient synthesis of novel trajectories, while also directly providing a convenient tool for inference, i.e., uncertainty estimation, likelihood evaluations and out of distribution detection for abnormal trajectories. These capabilities can have implications for various downstream tasks, e.g., simulation and evaluation for reinforcement learning.

Via

Access Paper or Ask Questions

Pooling Image Datasets With Multiple Covariate Shift and Imbalance

Mar 14, 2024

Sotirios Panagiotis Chytas, Vishnu Suresh Lokhande, Peiran Li, Vikas Singh

Figure 1 for Pooling Image Datasets With Multiple Covariate Shift and Imbalance

Figure 2 for Pooling Image Datasets With Multiple Covariate Shift and Imbalance

Figure 3 for Pooling Image Datasets With Multiple Covariate Shift and Imbalance

Figure 4 for Pooling Image Datasets With Multiple Covariate Shift and Imbalance

Abstract:Small sample sizes are common in many disciplines, which necessitates pooling roughly similar datasets across multiple institutions to study weak but relevant associations between images and disease outcomes. Such data often manifest shift/imbalance in covariates (i.e., secondary non-imaging data). Controlling for such nuisance variables is common within standard statistical analysis, but the ideas do not directly apply to overparameterized models. Consequently, recent work has shown how strategies from invariant representation learning provides a meaningful starting point, but the current repertoire of methods is limited to accounting for shifts/imbalances in just a couple of covariates at a time. In this paper, we show how viewing this problem from the perspective of Category theory provides a simple and effective solution that completely avoids elaborate multi-stage training pipelines that would otherwise be needed. We show the effectiveness of this approach via extensive experiments on real datasets. Further, we discuss how this style of formulation offers a unified perspective on at least 5+ distinct problem settings, from self-supervised learning to matching problems in 3D reconstruction.

* We need to do some fixes of references to make them more precise. This paper will be corrected and uploaded again by another group member

Via

Access Paper or Ask Questions

LookupFFN: Making Transformers Compute-lite for CPU inference

Mar 12, 2024

Zhanpeng Zeng, Michael Davies, Pranav Pulijala, Karthikeyan Sankaralingam, Vikas Singh

Abstract:While GPU clusters are the de facto choice for training large deep neural network (DNN) models today, several reasons including ease of workflow, security and cost have led to efforts investigating whether CPUs may be viable for inference in routine use in many sectors of the industry. But the imbalance between the compute capabilities of GPUs and CPUs is huge. Motivated by these considerations, we study a module which is a workhorse within modern DNN architectures, GEMM based Feed Forward Networks (FFNs), and assess the extent to which it can be made compute- (or FLOP-) lite. Specifically, we propose an alternative formulation (we call it LookupFFN) to GEMM based FFNs inspired by the recent studies of using Locality Sensitive Hashing (LSH) to approximate FFNs. Our formulation recasts most essential operations as a memory look-up, leveraging the trade-off between the two resources on any platform: compute and memory (since CPUs offer it in abundance). For RoBERTa language model pretraining, our formulation achieves similar performance compared to GEMM based FFNs, while dramatically reducing the required FLOP. Our development is complemented with a detailed hardware profiling of strategies that will maximize efficiency -- not just on contemporary hardware but on products that will be offered in the near/medium term future. Code is avaiable at \url{https://github.com/mlpen/LookupFFN}.

* ICML 2023

Via

Access Paper or Ask Questions

IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers

Mar 12, 2024

Zhanpeng Zeng, Karthikeyan Sankaralingam, Vikas Singh

Abstract:GEneral Matrix Multiply (GEMM) is a central operation in deep learning and corresponds to the largest chunk of the compute footprint. Therefore, improving its efficiency is an active topic of ongoing research. A popular strategy is the use of low bit-width integers to approximate the original entries in a matrix. This allows efficiency gains, but often requires sophisticated techniques to control the rounding error incurred. In this work, we first verify/check that when the low bit-width restriction is removed, for a variety of Transformer-based models, whether integers are sufficient for all GEMMs need -- for {\em both} training and inference stages, and can achieve parity with floating point counterparts. No sophisticated techniques are needed. We find that while a large majority of entries in matrices (encountered in such models) can be easily represented by {\em low} bit-width integers, the existence of a few heavy hitter entries make it difficult to achieve efficiency gains via the exclusive use of low bit-width GEMMs alone. To address this issue, we develop a simple algorithm, Integer Matrix Unpacking (IM-Unpack), to {\em unpack} a matrix with large integer entries into a larger matrix whose entries all lie within the representable range of arbitrarily low bit-width integers. This allows {\em equivalence} with the original GEMM, i.e., the exact result can be obtained using purely low bit-width integer GEMMs. This comes at the cost of additional operations -- we show that for many popular models, this overhead is quite small.

Via

Access Paper or Ask Questions

FrameQuant: Flexible Low-Bit Quantization for Transformers

Mar 10, 2024

Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang, Vikas Singh

Figure 1 for FrameQuant: Flexible Low-Bit Quantization for Transformers

Figure 2 for FrameQuant: Flexible Low-Bit Quantization for Transformers

Figure 3 for FrameQuant: Flexible Low-Bit Quantization for Transformers

Figure 4 for FrameQuant: Flexible Low-Bit Quantization for Transformers

Abstract:Transformers are the backbone of powerful foundation models for many Vision and Natural Language Processing tasks. But their compute and memory/storage footprint is large, and so, serving such models is expensive often requiring high-end hardware. To mitigate this difficulty, Post-Training Quantization seeks to modify a pre-trained model and quantize it to eight bits or lower, significantly boosting compute/memory/latency efficiency. Such models have been successfully quantized to four bits with some performance loss. In this work, we outline a simple scheme to quantize Transformer-based models to just two bits (plus some overhead) with only a small drop in accuracy. Key to our formulation is a concept borrowed from Harmonic analysis called Fusion Frames. Our main finding is that the quantization must take place not in the original weight space, but instead in the Fusion Frame representations. If quantization is interpreted as the addition of noise, our casting of the problem allows invoking an extensive body of known consistent recovery and noise robustness guarantees. Further, if desired, de-noising filters are known in closed form. We show empirically, via a variety of experiments, that (almost) two-bit quantization for Transformer models promises sizable efficiency gains.

* 25 pages, 15 figures

Via

Access Paper or Ask Questions