Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mansooreh Montazerin

SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space

Jun 11, 2025

Zitong Huang, Mansooreh Montazerin, Ajitesh Srivastava

Abstract:Designing neural networks typically relies on manual trial and error or a neural architecture search (NAS) followed by weight training. The former is time-consuming and labor-intensive, while the latter often discretizes architecture search and weight optimization. In this paper, we propose a fundamentally different approach that simultaneously optimizes both the architecture and the weights of a neural network. Our framework first trains a universal multi-scale autoencoder that embeds both architectural and parametric information into a continuous latent space, where functionally similar neural networks are mapped closer together. Given a dataset, we then randomly initialize a point in the embedding space and update it via gradient descent to obtain the optimal neural network, jointly optimizing its structure and weights. The optimization process incorporates sparsity and compactness penalties to promote efficient models. Experiments on synthetic regression tasks demonstrate that our method effectively discovers sparse and compact neural networks with strong performance.

Via

Access Paper or Ask Questions

Sparse Interpretable Deep Learning with LIES Networks for Symbolic Regression

Jun 09, 2025

Mansooreh Montazerin, Majd Al Aawar, Antonio Ortega, Ajitesh Srivastava

Abstract:Symbolic regression (SR) aims to discover closed-form mathematical expressions that accurately describe data, offering interpretability and analytical insight beyond standard black-box models. Existing SR methods often rely on population-based search or autoregressive modeling, which struggle with scalability and symbolic consistency. We introduce LIES (Logarithm, Identity, Exponential, Sine), a fixed neural network architecture with interpretable primitive activations that are optimized to model symbolic expressions. We develop a framework to extract compact formulae from LIES networks by training with an appropriate oversampling strategy and a tailored loss function to promote sparsity and to prevent gradient instability. After training, it applies additional pruning strategies to further simplify the learned expressions into compact formulae. Our experiments on SR benchmarks show that the LIES framework consistently produces sparse and accurate symbolic formulae outperforming all baselines. We also demonstrate the importance of each design component through ablation studies.

Via

Access Paper or Ask Questions

Simultaneous Weight and Architecture Optimization for Neural Networks

Oct 10, 2024

Zitong Huang, Mansooreh Montazerin, Ajitesh Srivastava

Figure 1 for Simultaneous Weight and Architecture Optimization for Neural Networks

Figure 2 for Simultaneous Weight and Architecture Optimization for Neural Networks

Figure 3 for Simultaneous Weight and Architecture Optimization for Neural Networks

Figure 4 for Simultaneous Weight and Architecture Optimization for Neural Networks

Abstract:Neural networks are trained by choosing an architecture and training the parameters. The choice of architecture is often by trial and error or with Neural Architecture Search (NAS) methods. While NAS provides some automation, it often relies on discrete steps that optimize the architecture and then train the parameters. We introduce a novel neural network training framework that fundamentally transforms the process by learning architecture and parameters simultaneously with gradient descent. With the appropriate setting of the loss function, it can discover sparse and compact neural networks for given datasets. Central to our approach is a multi-scale encoder-decoder, in which the encoder embeds pairs of neural networks with similar functionalities close to each other (irrespective of their architectures and weights). To train a neural network with a given dataset, we randomly sample a neural network embedding in the embedding space and then perform gradient descent using our custom loss function, which incorporates a sparsity penalty to encourage compactness. The decoder generates a neural network corresponding to the embedding. Experiments demonstrate that our framework can discover sparse and compact neural networks maintaining a high performance.

* Accepted to NeurIPS 2024 FITML (Fine-Tuning in Modern Machine Learning) Workshop

Via

Access Paper or Ask Questions

Global Prediction of COVID-19 Variant Emergence Using Dynamics-Informed Graph Neural Networks

Jan 07, 2024

Majd Al Aawar, Srikar Mutnuri, Mansooreh Montazerin, Ajitesh Srivastava

Abstract:During the COVID-19 pandemic, a major driver of new surges has been the emergence of new variants. When a new variant emerges in one or more countries, other nations monitor its spread in preparation for its potential arrival. The impact of the variant and the timing of epidemic peaks in a country highly depend on when the variant arrives. The current methods for predicting the spread of new variants rely on statistical modeling, however, these methods work only when the new variant has already arrived in the region of interest and has a significant prevalence. The question arises: Can we predict when (and if) a variant that exists elsewhere will arrive in a given country and reach a certain prevalence? We propose a variant-dynamics-informed Graph Neural Network (GNN) approach. First, We derive the dynamics of variant prevalence across pairs of regions (countries) that applies to a large class of epidemic models. The dynamics suggest that ratios of variant proportions lead to simpler patterns. Therefore, we use ratios of variant proportions along with some parameters estimated from the dynamics as features in a GNN. We develop a benchmarking tool to evaluate variant emergence prediction over 87 countries and 36 variants. We leverage this tool to compare our GNN-based approach against our dynamics-only model and a number of machine learning models. Results show that the proposed dynamics-informed GNN method retrospectively outperforms all the baselines, including the currently pervasive framework of Physics-Informed Neural Networks (PINNs) that incorporates the dynamics in the loss function.

Via

Access Paper or Ask Questions

Transformer-based Hand Gesture Recognition via High-Density EMG Signals: From Instantaneous Recognition to Fusion of Motor Unit Spike Trains

Dec 07, 2022

Mansooreh Montazerin, Elahe Rahimian, Farnoosh Naderkhani, S. Farokh Atashzar, Svetlana Yanushkevich, Arash Mohammadi

Abstract:Designing efficient and labor-saving prosthetic hands requires powerful hand gesture recognition algorithms that can achieve high accuracy with limited complexity and latency. In this context, the paper proposes a compact deep learning framework referred to as the CT-HGR, which employs a vision transformer network to conduct hand gesture recognition using highdensity sEMG (HD-sEMG) signals. The attention mechanism in the proposed model identifies similarities among different data segments with a greater capacity for parallel computations and addresses the memory limitation problems while dealing with inputs of large sequence lengths. CT-HGR can be trained from scratch without any need for transfer learning and can simultaneously extract both temporal and spatial features of HD-sEMG data. Additionally, the CT-HGR framework can perform instantaneous recognition using sEMG image spatially composed from HD-sEMG signals. A variant of the CT-HGR is also designed to incorporate microscopic neural drive information in the form of Motor Unit Spike Trains (MUSTs) extracted from HD-sEMG signals using Blind Source Separation (BSS). This variant is combined with its baseline version via a hybrid architecture to evaluate potentials of fusing macroscopic and microscopic neural drive information. The utilized HD-sEMG dataset involves 128 electrodes that collect the signals related to 65 isometric hand gestures of 20 subjects. The proposed CT-HGR framework is applied to 31.25, 62.5, 125, 250 ms window sizes of the above-mentioned dataset utilizing 32, 64, 128 electrode channels. The average accuracy over all the participants using 32 electrodes and a window size of 31.25 ms is 86.23%, which gradually increases till reaching 91.98% for 128 electrodes and a window size of 250 ms. The CT-HGR achieves accuracy of 89.13% for instantaneous recognition based on a single frame of HD-sEMG image.

Via

Access Paper or Ask Questions

HYDRA-HGR: A Hybrid Transformer-based Architecture for Fusion of Macroscopic and Microscopic Neural Drive Information

Oct 27, 2022

Mansooreh Montazerin, Elahe Rahimian, Farnoosh Naderkhani, S. Farokh Atashzar, Hamid Alinejad-Rokny, Arash Mohammadi

Abstract:Development of advance surface Electromyogram (sEMG)-based Human-Machine Interface (HMI) systems is of paramount importance to pave the way towards emergence of futuristic Cyber-Physical-Human (CPH) worlds. In this context, the main focus of recent literature was on development of different Deep Neural Network (DNN)-based architectures that perform Hand Gesture Recognition (HGR) at a macroscopic level (i.e., directly from sEMG signals). At the same time, advancements in acquisition of High-Density sEMG signals (HD-sEMG) have resulted in a surge of significant interest on sEMG decomposition techniques to extract microscopic neural drive information. However, due to complexities of sEMG decomposition and added computational overhead, HGR at microscopic level is less explored than its aforementioned DNN-based counterparts. In this regard, we propose the HYDRA-HGR framework, which is a hybrid model that simultaneously extracts a set of temporal and spatial features through its two independent Vision Transformer (ViT)-based parallel architectures (the so called Macro and Micro paths). The Macro Path is trained directly on the pre-processed HD-sEMG signals, while the Micro path is fed with the p-to-p values of the extracted Motor Unit Action Potentials (MUAPs) of each source. Extracted features at macroscopic and microscopic levels are then coupled via a Fully Connected (FC) fusion layer. We evaluate the proposed hybrid HYDRA-HGR framework through a recently released HD-sEMG dataset, and show that it significantly outperforms its stand-alone counterparts. The proposed HYDRA-HGR framework achieves average accuracy of 94.86% for the 250 ms window size, which is 5.52% and 8.22% higher than that of the Macro and Micro paths, respectively.

Via

Access Paper or Ask Questions

ViT-HGR: Vision Transformer-based Hand Gesture Recognition from High Density Surface EMG Signals

Jan 25, 2022

Mansooreh Montazerin, Soheil Zabihi, Elahe Rahimian, Arash Mohammadi, Farnoosh Naderkhani

Figure 1 for ViT-HGR: Vision Transformer-based Hand Gesture Recognition from High Density Surface EMG Signals

Figure 2 for ViT-HGR: Vision Transformer-based Hand Gesture Recognition from High Density Surface EMG Signals

Figure 3 for ViT-HGR: Vision Transformer-based Hand Gesture Recognition from High Density Surface EMG Signals

Figure 4 for ViT-HGR: Vision Transformer-based Hand Gesture Recognition from High Density Surface EMG Signals

Abstract:Recently, there has been a surge of significant interest on application of Deep Learning (DL) models to autonomously perform hand gesture recognition using surface Electromyogram (sEMG) signals. DL models are, however, mainly designed to be applied on sparse sEMG signals. Furthermore, due to their complex structure, typically, we are faced with memory constraints; require large training times and a large number of training samples, and; there is the need to resort to data augmentation and/or transfer learning. In this paper, for the first time (to the best of our knowledge), we investigate and design a Vision Transformer (ViT) based architecture to perform hand gesture recognition from High Density (HD-sEMG) signals. Intuitively speaking, we capitalize on the recent breakthrough role of the transformer architecture in tackling different complex problems together with its potential for employing more input parallelization via its attention mechanism. The proposed Vision Transformer-based Hand Gesture Recognition (ViT-HGR) framework can overcome the aforementioned training time problems and can accurately classify a large number of hand gestures from scratch without any need for data augmentation and/or transfer learning. The efficiency of the proposed ViT-HGR framework is evaluated using a recently-released HD-sEMG dataset consisting of 65 isometric hand gestures. Our experiments with 64-sample (31.25 ms) window size yield average test accuracy of 84.62 +/- 3.07%, where only 78, 210 number of parameters is utilized. The compact structure of the proposed ViT-based ViT-HGR framework (i.e., having significantly reduced number of trainable parameters) shows great potentials for its practical application for prosthetic control.

Via

Access Paper or Ask Questions