Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Geoffrey C. Fox

Exploring the Energy Landscape of RBMs: Reciprocal Space Insights into Bosons, Hierarchical Learning and Symmetry Breaking

Mar 27, 2025

J. Quetzalcóatl Toledo-Marin, Anindita Maiti, Geoffrey C. Fox, Roger G. Melko

Abstract:Deep generative models have become ubiquitous due to their ability to learn and sample from complex distributions. Despite the proliferation of various frameworks, the relationships among these models remain largely unexplored, a gap that hinders the development of a unified theory of AI learning. We address two central challenges: clarifying the connections between different deep generative models and deepening our understanding of their learning mechanisms. We focus on Restricted Boltzmann Machines (RBMs), known for their universal approximation capabilities for discrete distributions. By introducing a reciprocal space formulation, we reveal a connection between RBMs, diffusion processes, and coupled Bosons. We show that at initialization, the RBM operates at a saddle point, where the local curvature is determined by the singular values, whose distribution follows the Marcenko-Pastur law and exhibits rotational symmetry. During training, this rotational symmetry is broken due to hierarchical learning, where different degrees of freedom progressively capture features at multiple levels of abstraction. This leads to a symmetry breaking in the energy landscape, reminiscent of Landau theory. This symmetry breaking in the energy landscape is characterized by the singular values and the weight matrix eigenvector matrix. We derive the corresponding free energy in a mean-field approximation. We show that in the limit of infinite size RBM, the reciprocal variables are Gaussian distributed. Our findings indicate that in this regime, there will be some modes for which the diffusion process will not converge to the Boltzmann distribution. To illustrate our results, we trained replicas of RBMs with different hidden layer sizes using the MNIST dataset. Our findings bridge the gap between disparate generative frameworks and also shed light on the processes underpinning learning in generative models.

* 19pp, 8figs, research article

Via

Access Paper or Ask Questions

Residual Vision Transformer (ResViT) Based Self-Supervised Learning Model for Brain Tumor Classification

Nov 19, 2024

Meryem Altin Karagoz, O. Ufuk Nalbantoglu, Geoffrey C. Fox

Abstract:Deep learning has proven very promising for interpreting MRI in brain tumor diagnosis. However, deep learning models suffer from a scarcity of brain MRI datasets for effective training. Self-supervised learning (SSL) models provide data-efficient and remarkable solutions to limited dataset problems. Therefore, this paper introduces a generative SSL model for brain tumor classification in two stages. The first stage is designed to pre-train a Residual Vision Transformer (ResViT) model for MRI synthesis as a pretext task. The second stage includes fine-tuning a ResViT-based classifier model as a downstream task. Accordingly, we aim to leverage local features via CNN and global features via ViT, employing a hybrid CNN-transformer architecture for ResViT in pretext and downstream tasks. Moreover, synthetic MRI images are utilized to balance the training set. The proposed model performs on public BraTs 2023, Figshare, and Kaggle datasets. Furthermore, we compare the proposed model with various deep learning models, including A-UNet, ResNet-9, pix2pix, pGAN for MRI synthesis, and ConvNeXtTiny, ResNet101, DenseNet12, Residual CNN, ViT for classification. According to the results, the proposed model pretraining on the MRI dataset is superior compared to the pretraining on the ImageNet dataset. Overall, the proposed model attains the highest accuracy, achieving 90.56% on the BraTs dataset with T1 sequence, 98.53% on the Figshare, and 98.47% on the Kaggle brain tumor datasets. As a result, the proposed model demonstrates a robust, effective, and successful approach to handling insufficient dataset challenges in MRI analysis by incorporating SSL, fine-tuning, data augmentation, and combining CNN and ViT.

Via

Access Paper or Ask Questions

Conditioned quantum-assisted deep generative surrogate for particle-calorimeter interactions

Oct 30, 2024

J. Quetzalcoatl Toledo-Marin, Sebastian Gonzalez, Hao Jia, Ian Lu, Deniz Sogutlu, Abhishek Abhishek, Colin Gay, Eric Paquet, Roger Melko, Geoffrey C. Fox(+2 more)

Figure 1 for Conditioned quantum-assisted deep generative surrogate for particle-calorimeter interactions

Figure 2 for Conditioned quantum-assisted deep generative surrogate for particle-calorimeter interactions

Figure 3 for Conditioned quantum-assisted deep generative surrogate for particle-calorimeter interactions

Figure 4 for Conditioned quantum-assisted deep generative surrogate for particle-calorimeter interactions

Abstract:Particle collisions at accelerators such as the Large Hadron Collider, recorded and analyzed by experiments such as ATLAS and CMS, enable exquisite measurements of the Standard Model and searches for new phenomena. Simulations of collision events at these detectors have played a pivotal role in shaping the design of future experiments and analyzing ongoing ones. However, the quest for accuracy in Large Hadron Collider (LHC) collisions comes at an imposing computational cost, with projections estimating the need for millions of CPU-years annually during the High Luminosity LHC (HL-LHC) run \cite{collaboration2022atlas}. Simulating a single LHC event with \textsc{Geant4} currently devours around 1000 CPU seconds, with simulations of the calorimeter subdetectors in particular imposing substantial computational demands \cite{rousseau2023experimental}. To address this challenge, we propose a conditioned quantum-assisted deep generative model. Our model integrates a conditioned variational autoencoder (VAE) on the exterior with a conditioned Restricted Boltzmann Machine (RBM) in the latent space, providing enhanced expressiveness compared to conventional VAEs. The RBM nodes and connections are meticulously engineered to enable the use of qubits and couplers on D-Wave's Pegasus-structured \textit{Advantage} quantum annealer (QA) for sampling. We introduce a novel method for conditioning the quantum-assisted RBM using \textit{flux biases}. We further propose a novel adaptive mapping to estimate the effective inverse temperature in quantum annealers. The effectiveness of our framework is illustrated using Dataset 2 of the CaloChallenge \cite{calochallenge}.

* 26 pages, 10 figures, 8 appendices

Via

Access Paper or Ask Questions

TSEQPREDICTOR: Spatiotemporal Extreme Earthquakes Forecasting for Southern California

Dec 20, 2020

Bo Feng, Geoffrey C. Fox

Figure 1 for TSEQPREDICTOR: Spatiotemporal Extreme Earthquakes Forecasting for Southern California

Figure 2 for TSEQPREDICTOR: Spatiotemporal Extreme Earthquakes Forecasting for Southern California

Figure 3 for TSEQPREDICTOR: Spatiotemporal Extreme Earthquakes Forecasting for Southern California

Figure 4 for TSEQPREDICTOR: Spatiotemporal Extreme Earthquakes Forecasting for Southern California

Abstract:Seismology from the past few decades has utilized the most advanced technologies and equipment to monitor seismic events globally. However, forecasting disasters like earthquakes is still an underdeveloped topic from the history. Recent researches in spatiotemporal forecasting have revealed some possibilities of successful predictions, which becomes an important topic in many scientific research fields. Most studies of them have many successful applications of using deep neural networks. In the geoscience study, earthquake prediction is one of the world's most challenging problems, about which cutting edge deep learning technologies may help to discover some useful patterns. In this project, we propose a joint deep learning modeling method for earthquake forecasting, namely TSEQPREDICTOR. In TSEQPREDICTOR, we use comprehensive deep learning technologies with domain knowledge in seismology and exploit the prediction problem using encoder-decoder and temporal convolutional neural networks. Comparing to some state-of-art recurrent neural networks, our experiments show our method is promising in terms of predicting major shocks for earthquakes in Southern California.

Via

Access Paper or Ask Questions

CryptoGRU: Low Latency Privacy-Preserving Text Analysis With GRU

Oct 22, 2020

Bo Feng, Qian Lou, Lei Jiang, Geoffrey C. Fox

Figure 1 for CryptoGRU: Low Latency Privacy-Preserving Text Analysis With GRU

Figure 2 for CryptoGRU: Low Latency Privacy-Preserving Text Analysis With GRU

Figure 3 for CryptoGRU: Low Latency Privacy-Preserving Text Analysis With GRU

Figure 4 for CryptoGRU: Low Latency Privacy-Preserving Text Analysis With GRU

Abstract:Billions of text analysis requests containing private emails, personal text messages, and sensitive online reviews, are processed by recurrent neural networks (RNNs) deployed on public clouds every day. Although prior secure networks combine homomorphic encryption (HE) and garbled circuit (GC) to preserve users' privacy, naively adopting the HE and GC hybrid technique to implement RNNs suffers from long inference latency due to slow activation functions. In this paper, we present a HE and GC hybrid gated recurrent unit (GRU) network, CryptoGRU, for low-latency secure inferences. CryptoGRU replaces computationally expensive GC-based $tanh$ with fast GC-based $ReLU$, and then quantizes $sigmoid$ and $ReLU$ with a smaller bit length to accelerate activations in a GRU. We evaluate CryptoGRU with multiple GRU models trained on 4 public datasets. Experimental results show CryptoGRU achieves top-notch accuracy and improves the secure inference latency by up to $138\times$ over one of state-of-the-art secure networks on the Penn Treebank dataset.

Via

Access Paper or Ask Questions

AICov: An Integrative Deep Learning Framework for COVID-19 Forecasting with Population Covariates

Oct 08, 2020

Geoffrey C. Fox, Gregor von Laszewski, Fugang Wang, Saumyadipta Pyne

Figure 1 for AICov: An Integrative Deep Learning Framework for COVID-19 Forecasting with Population Covariates

Figure 2 for AICov: An Integrative Deep Learning Framework for COVID-19 Forecasting with Population Covariates

Figure 3 for AICov: An Integrative Deep Learning Framework for COVID-19 Forecasting with Population Covariates

Figure 4 for AICov: An Integrative Deep Learning Framework for COVID-19 Forecasting with Population Covariates

Abstract:The COVID-19 pandemic has profound global consequences on health, economic, social, political, and almost every major aspect of human life. Therefore, it is of great importance to model COVID-19 and other pandemics in terms of the broader social contexts in which they take place. We present the architecture of AICov, which provides an integrative deep learning framework for COVID-19 forecasting with population covariates, some of which may serve as putative risk factors. We have integrated multiple different strategies into AICov, including the ability to use deep learning strategies based on LSTM and even modeling. To demonstrate our approach, we have conducted a pilot that integrates population covariates from multiple sources. Thus, AICov not only includes data on COVID-19 cases and deaths but, more importantly, the population's socioeconomic, health and behavioral risk factors at a local level. The compiled data are fed into AICov, and thus we obtain improved prediction by integration of the data to our model as compared to one that only uses case and death data.

* 25 pages, 4 tabkes, 19 figures

Via

Access Paper or Ask Questions

Deep Learning Based Integrators for Solving Newton's Equations with Large Timesteps

May 17, 2020

JCS Kadupitiya, Geoffrey C. Fox, Vikram Jadhao

Figure 1 for Deep Learning Based Integrators for Solving Newton's Equations with Large Timesteps

Figure 2 for Deep Learning Based Integrators for Solving Newton's Equations with Large Timesteps

Figure 3 for Deep Learning Based Integrators for Solving Newton's Equations with Large Timesteps

Figure 4 for Deep Learning Based Integrators for Solving Newton's Equations with Large Timesteps

Abstract:Classical molecular dynamics simulations are based on Newton's equations of motion and rely on numerical integrators to solve them. Using a small timestep to avoid discretization errors, Verlet integrators generate a trajectory of particle positions as solutions to Newton's equations. We introduce an integrator based on deep neural networks that is trained on trajectories generated using the Verlet integrator and learns to propagate the dynamics of particles with timestep up to 4000$\times$ larger compared to the Verlet timestep. We demonstrate significant net speedup of up to 32000 for 1 - 16 particle 3D systems and over a variety of force fields.

* 14 pages, 11 figures; content is revised

Via

Access Paper or Ask Questions

Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data

Nov 16, 2019

Qian Lou, Bo Feng, Geoffrey C. Fox, Lei Jiang

Figure 1 for Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data

Figure 2 for Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data

Figure 3 for Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data

Figure 4 for Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data

Abstract:Big data is one of the cornerstones to enabling and training deep neural networks (DNNs). Because of the lack of expertise, to gain benefits from their data, average users have to rely on and upload their private data to big data companies they may not trust. Due to the compliance, legal, or privacy constraints, most users are willing to contribute only their encrypted data, and lack interests or resources to join the training of DNNs in cloud. To train a DNN on encrypted data in a completely non-interactive way, a recent work proposes a fully homomorphic encryption (FHE)-based technique implementing all activations in the neural network by \textit{Brakerski-Gentry-Vaikuntanathan (BGV)}-based lookup tables. However, such inefficient lookup-table-based activations significantly prolong the training latency of privacy-preserving DNNs. In this paper, we propose, Glyph, a FHE-based scheme to fast and accurately train DNNs on encrypted data by switching between TFHE (Fast Fully Homomorphic Encryption over the Torus) and BGV cryptosystems. Glyph uses logic-operation-friendly TFHE to implement nonlinear activations, while adopts vectorial-arithmetic-friendly BGV to perform multiply-accumulation (MAC) operations. Glyph further applies transfer learning on the training of DNNs to improve the test accuracy and reduce the number of MAC operations between ciphertext and ciphertext in convolutional layers. Our experimental results show Glyph obtains the state-of-the-art test accuracy, but reduces the training latency by $99\%$ over the prior FHE-based technique on various encrypted datasets.

* 8 pages, 8 figures

Via

Access Paper or Ask Questions