Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Feng Bao

Department of Mathematics at Florida State University, Tallahassee, Florida, USA

Federated Learning on Stochastic Neural Networks

Jun 09, 2025

Jingqiao Tang, Ryan Bausback, Feng Bao, Richard Archibald

Abstract:Federated learning is a machine learning paradigm that leverages edge computing on client devices to optimize models while maintaining user privacy by ensuring that local data remains on the device. However, since all data is collected by clients, federated learning is susceptible to latent noise in local datasets. Factors such as limited measurement capabilities or human errors may introduce inaccuracies in client data. To address this challenge, we propose the use of a stochastic neural network as the local model within the federated learning framework. Stochastic neural networks not only facilitate the estimation of the true underlying states of the data but also enable the quantification of latent noise. We refer to our federated learning approach, which incorporates stochastic neural networks as local models, as Federated stochastic neural networks. We will present numerical experiments demonstrating the performance and effectiveness of our method, particularly in handling non-independent and identically distributed data.

* 25 pages, 19 figures, Submitted to Journal of Machine Learning for Modeling and Computing

Via

Access Paper or Ask Questions

State Estimation Using Particle Filtering in Adaptive Machine Learning Methods: Integrating Q-Learning and NEAT Algorithms with Noisy Radar Measurements

Apr 10, 2025

Wonjin Song, Feng Bao

Abstract:Reliable state estimation is essential for autonomous systems operating in complex, noisy environments. Classical filtering approaches, such as the Kalman filter, can struggle when facing nonlinear dynamics or non-Gaussian noise, and even more flexible particle filters often encounter sample degeneracy or high computational costs in large-scale domains. Meanwhile, adaptive machine learning techniques, including Q-learning and neuroevolutionary algorithms such as NEAT, rely heavily on accurate state feedback to guide learning; when sensor data are imperfect, these methods suffer from degraded convergence and suboptimal performance. In this paper, we propose an integrated framework that unifies particle filtering with Q-learning and NEAT to explicitly address the challenge of noisy measurements. By refining radar-based observations into reliable state estimates, our particle filter drives more stable policy updates (in Q-learning) or controller evolution (in NEAT), allowing both reinforcement learning and neuroevolution to converge faster, achieve higher returns or fitness, and exhibit greater resilience to sensor uncertainty. Experiments on grid-based navigation and a simulated car environment highlight consistent gains in training stability, final performance, and success rates over baselines lacking advanced filtering. Altogether, these findings underscore that accurate state estimation is not merely a preprocessing step, but a vital component capable of substantially enhancing adaptive machine learning in real-world applications plagued by sensor noise.

Via

Access Paper or Ask Questions

A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics

Jul 16, 2024

Junqi Yin, Siming Liang, Siyan Liu, Feng Bao, Hristo G. Chipilski, Dan Lu, Guannan Zhang

Figure 1 for A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics

Figure 2 for A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics

Figure 3 for A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics

Figure 4 for A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics

Abstract:The weather and climate domains are undergoing a significant transformation thanks to advances in AI-based foundation models such as FourCastNet, GraphCast, ClimaX and Pangu-Weather. While these models show considerable potential, they are not ready yet for operational use in weather forecasting or climate prediction. This is due to the lack of a data assimilation method as part of their workflow to enable the assimilation of incoming Earth system observations in real time. This limitation affects their effectiveness in predicting complex atmospheric phenomena such as tropical cyclones and atmospheric rivers. To overcome these obstacles, we introduce a generic real-time data assimilation framework and demonstrate its end-to-end performance on the Frontier supercomputer. This framework comprises two primary modules: an ensemble score filter (EnSF), which significantly outperforms the state-of-the-art data assimilation method, namely, the Local Ensemble Transform Kalman Filter (LETKF); and a vision transformer-based surrogate capable of real-time adaptation through the integration of observational data. The ViT surrogate can represent either physics-based models or AI-based foundation models. We demonstrate both the strong and weak scaling of our framework up to 1024 GPUs on the Exascale supercomputer, Frontier. Our results not only illustrate the framework's exceptional scalability on high-performance computing systems, but also demonstrate the importance of supercomputers in real-time data assimilation for weather and climate predictions. Even though the proposed framework is tested only on a benchmark surface quasi-geostrophic (SQG) turbulence system, it has the potential to be combined with existing AI-based foundation models, making it suitable for future operational implementations.

Via

Access Paper or Ask Questions

Convergence analysis of kernel learning FBSDE filter

May 22, 2024

Yunzheng Lyu, Feng Bao

Abstract:Kernel learning forward backward SDE filter is an iterative and adaptive meshfree approach to solve the nonlinear filtering problem. It builds from forward backward SDE for Fokker-Planker equation, which defines evolving density for the state variable, and employs KDE to approximate density. This algorithm has shown more superior performance than mainstream particle filter method, in both convergence speed and efficiency of solving high dimension problems. However, this method has only been shown to converge empirically. In this paper, we present a rigorous analysis to demonstrate its local and global convergence, and provide theoretical support for its empirical results.

Via

Access Paper or Ask Questions

Improving the Expressive Power of Deep Neural Networks through Integral Activation Transform

Dec 19, 2023

Zezhong Zhang, Feng Bao, Guannan Zhang

Abstract:The impressive expressive power of deep neural networks (DNNs) underlies their widespread applicability. However, while the theoretical capacity of deep architectures is high, the practical expressive power achieved through successful training often falls short. Building on the insights gained from Neural ODEs, which explore the depth of DNNs as a continuous variable, in this work, we generalize the traditional fully connected DNN through the concept of continuous width. In the Generalized Deep Neural Network (GDNN), the traditional notion of neurons in each layer is replaced by a continuous state function. Using the finite rank parameterization of the weight integral kernel, we establish that GDNN can be obtained by employing the Integral Activation Transform (IAT) as activation layers within the traditional DNN framework. The IAT maps the input vector to a function space using some basis functions, followed by nonlinear activation in the function space, and then extracts information through the integration with another collection of basis functions. A specific variant, IAT-ReLU, featuring the ReLU nonlinearity, serves as a smooth generalization of the scalar ReLU activation. Notably, IAT-ReLU exhibits a continuous activation pattern when continuous basis functions are employed, making it smooth and enhancing the trainability of the DNN. Our numerical experiments demonstrate that IAT-ReLU outperforms regular ReLU in terms of trainability and better smoothness.

* 26 pages, 6 figures

Via

Access Paper or Ask Questions

Diffusion-Model-Assisted Supervised Learning of Generative Models for Density Estimation

Oct 22, 2023

Yanfang Liu, Minglei Yang, Zezhong Zhang, Feng Bao, Yanzhao Cao, Guannan Zhang

Figure 1 for Diffusion-Model-Assisted Supervised Learning of Generative Models for Density Estimation

Figure 2 for Diffusion-Model-Assisted Supervised Learning of Generative Models for Density Estimation

Figure 3 for Diffusion-Model-Assisted Supervised Learning of Generative Models for Density Estimation

Figure 4 for Diffusion-Model-Assisted Supervised Learning of Generative Models for Density Estimation

Abstract:We present a supervised learning framework of training generative models for density estimation. Generative models, including generative adversarial networks, normalizing flows, variational auto-encoders, are usually considered as unsupervised learning models, because labeled data are usually unavailable for training. Despite the success of the generative models, there are several issues with the unsupervised training, e.g., requirement of reversible architectures, vanishing gradients, and training instability. To enable supervised learning in generative models, we utilize the score-based diffusion model to generate labeled data. Unlike existing diffusion models that train neural networks to learn the score function, we develop a training-free score estimation method. This approach uses mini-batch-based Monte Carlo estimators to directly approximate the score function at any spatial-temporal location in solving an ordinary differential equation (ODE), corresponding to the reverse-time stochastic differential equation (SDE). This approach can offer both high accuracy and substantial time savings in neural network training. Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner. Compared with existing normalizing flow models, our method does not require to use reversible neural networks and avoids the computation of the Jacobian matrix. Compared with existing diffusion models, our method does not need to solve the reverse-time SDE to generate new samples. As a result, the sampling efficiency is significantly improved. We demonstrate the performance of our method by applying it to a set of 2D datasets as well as real data from the UCI repository.

Via

Access Paper or Ask Questions

An Ensemble Score Filter for Tracking High-Dimensional Nonlinear Dynamical Systems

Sep 02, 2023

Feng Bao, Zezhong Zhang, Guannan Zhang

Abstract:We propose an ensemble score filter (EnSF) for solving high-dimensional nonlinear filtering problems with superior accuracy. A major drawback of existing filtering methods, e.g., particle filters or ensemble Kalman filters, is the low accuracy in handling high-dimensional and highly nonlinear problems. EnSF attacks this challenge by exploiting the score-based diffusion model, defined in a pseudo-temporal domain, to characterizing the evolution of the filtering density. EnSF stores the information of the recursively updated filtering density function in the score function, in stead of storing the information in a set of finite Monte Carlo samples (used in particle filters and ensemble Kalman filters). Unlike existing diffusion models that train neural networks to approximate the score function, we develop a training-free score estimation that uses mini-batch-based Monte Carlo estimator to directly approximate the score function at any pseudo-spatial-temporal location, which provides sufficient accuracy in solving high-dimensional nonlinear problems as well as saves tremendous amount of time spent on training neural networks. Another essential aspect of EnSF is its analytical update step, gradually incorporating data information into the score function, which is crucial in mitigating the degeneracy issue faced when dealing with very high-dimensional nonlinear filtering problems. High-dimensional Lorenz systems are used to demonstrate the performance of our method. EnSF provides surprisingly impressive performance in reliably tracking extremely high-dimensional Lorenz systems (up to 1,000,000 dimension) with highly nonlinear observation processes, which is a well-known challenging problem for existing filtering methods.

* arXiv admin note: text overlap with arXiv:2306.09282

Via

Access Paper or Ask Questions

TransNet: Transferable Neural Networks for Partial Differential Equations

Jan 27, 2023

Zezhong Zhang, Feng Bao, Lili Ju, Guannan Zhang

Abstract:Transfer learning for partial differential equations (PDEs) is to develop a pre-trained neural network that can be used to solve a wide class of PDEs. Existing transfer learning approaches require much information of the target PDEs such as its formulation and/or data of its solution for pre-training. In this work, we propose to construct transferable neural feature spaces from purely function approximation perspectives without using PDE information. The construction of the feature space involves re-parameterization of the hidden neurons and uses auxiliary functions to tune the resulting feature space. Theoretical analysis shows the high quality of the produced feature space, i.e., uniformly distributed neurons. Extensive numerical experiments verify the outstanding performance of our method, including significantly improved transferability, e.g., using the same feature space for various PDEs with different domains and boundary conditions, and the superior accuracy, e.g., several orders of magnitude smaller mean squared error than the state of the art methods.

Via

Access Paper or Ask Questions

Convergence Analysis for Training Stochastic Neural Networks via Stochastic Gradient Descent

Dec 17, 2022

Richard Archibald, Feng Bao, Yanzhao Cao, Hui Sun

Abstract:In this paper, we carry out numerical analysis to prove convergence of a novel sample-wise back-propagation method for training a class of stochastic neural networks (SNNs). The structure of the SNN is formulated as discretization of a stochastic differential equation (SDE). A stochastic optimal control framework is introduced to model the training procedure, and a sample-wise approximation scheme for the adjoint backward SDE is applied to improve the efficiency of the stochastic optimal control solver, which is equivalent to the back-propagation for training the SNN. The convergence analysis is derived with and without convexity assumption for optimization of the SNN parameters. Especially, our analysis indicates that the number of SNN training steps should be proportional to the square of the number of layers in the convex optimization case. Numerical experiments are carried out to validate the analysis results, and the performance of the sample-wise back-propagation method for training SNNs is examined by benchmark machine learning examples.

Via

Access Paper or Ask Questions

A Kernel Learning Method for Backward SDE Filter

Jan 25, 2022

Richard Archibald, Feng Bao

Abstract:In this paper, we develop a kernel learning backward SDE filter method to estimate the state of a stochastic dynamical system based on its partial noisy observations. A system of forward backward stochastic differential equations is used to propagate the state of the target dynamical model, and Bayesian inference is applied to incorporate the observational information. To characterize the dynamical model in the entire state space, we introduce a kernel learning method to learn a continuous global approximation for the conditional probability density function of the target state by using discrete approximated density values as training data. Numerical experiments demonstrate that the kernel learning backward SDE is highly effective and highly efficient.

Via

Access Paper or Ask Questions