Abstract:Solving Singularly Perturbed Differential Equations (SPDEs) poses computational challenges arising from the rapid transitions in their solutions within thin regions. The effectiveness of deep learning in addressing differential equations motivates us to employ these methods for solving SPDEs. In this manuscript, we introduce Component Fourier Neural Operator (ComFNO), an innovative operator learning method that builds upon Fourier Neural Operator (FNO), while simultaneously incorporating valuable prior knowledge obtained from asymptotic analysis. Our approach is not limited to FNO and can be applied to other neural network frameworks, such as Deep Operator Network (DeepONet), leading to potential similar SPDEs solvers. Experimental results across diverse classes of SPDEs demonstrate that ComFNO significantly improves accuracy compared to vanilla FNO. Furthermore, ComFNO exhibits natural adaptability to diverse data distributions and performs well in few-shot scenarios, showcasing its excellent generalization ability in practical situations.
Abstract:First-order methods, such as gradient descent (GD) and stochastic gradient descent (SGD), have been proven effective in training neural networks. In the context of over-parameterization, there is a line of work demonstrating that randomly initialized (stochastic) gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. However, the learning rate of GD for training two-layer neural networks exhibits poor dependence on the sample size and the Gram matrix, leading to a slow training process. In this paper, we show that for the $L^2$ regression problems, the learning rate can be improved from $\mathcal{O}(\lambda_0/n^2)$ to $\mathcal{O}(1/\|\bm{H}^{\infty}\|_2)$, which implies that GD actually enjoys a faster convergence rate. Furthermore, we generalize the method to GD in training two-layer Physics-Informed Neural Networks (PINNs), showing a similar improvement for the learning rate. Although the improved learning rate has a mild dependence on the Gram matrix, we still need to set it small enough in practice due to the unknown eigenvalues of the Gram matrix. More importantly, the convergence rate is tied to the least eigenvalue of the Gram matrix, which can lead to slow convergence. In this work, we provide the convergence analysis of natural gradient descent (NGD) in training two-layer PINNs, demonstrating that the learning rate can be $\mathcal{O}(1)$, and at this rate, the convergence rate is independent of the Gram matrix.