Abstract:We show that $d$-variate polynomials of degree $R$ can be represented on $[0,1]^d$ as shallow neural networks of width $2(R+d)^d$. Also, by SNN representation of localized Taylor polynomials of univariate $C^\beta$-smooth functions, we derive for shallow networks the minimax optimal rate of convergence, up to a logarithmic factor, to unknown univariate regression function.
Abstract:We consider regression estimation with modified ReLU neural networks in which network weight matrices are first modified by a function $\alpha$ before being multiplied by input vectors. We give an example of continuous, piecewise linear function $\alpha$ for which the empirical risk minimizers over the classes of modified ReLU networks with $l_1$ and squared $l_2$ penalties attain, up to a logarithmic factor, the minimax rate of prediction of unknown $\beta$-smooth function.
Abstract:We show that deep sparse ReLU networks with ternary weights and deep ReLU networks with binary weights can approximate $\beta$-H\"older functions on $[0,1]^d$. Also, continuous functions on $[0,1]^d$ can be approximated by networks of depth $2$ with binary activation function $\mathds{1}_{[0,1)}$.
Abstract:An example of an activation function $\sigma$ is given such that networks with activations $\{\sigma, \lfloor\cdot\rfloor\}$, integer weights and a fixed architecture depending on $d$ approximate continuous functions on $[0,1]^d$. The range of integer weights required for $\varepsilon$-approximation of H\"older continuous functions is derived, which leads to a convergence rate of order $n^{\frac{-2\beta}{2\beta+d}}\log_2n$ for neural network regression estimation of unknown $\beta$-H\"older continuous function with given $n$ samples.
Abstract:We provide an entropy bound for the spaces of neural networks with piecewise linear activation functions, such as the ReLU and the absolute value functions. This bound generalizes the known entropy bound for the space of linear functions on $\mathbb{R}^d$ and it depends on the value at the point $(1,1,...,1)$ of the networks obtained by taking the absolute values of all parameters of original networks. Keeping this value together with the depth, width and the parameters of the networks to have logarithmic dependence on $1/\varepsilon$, we $\varepsilon$-approximate functions that are analytic on certain regions of $\mathbb{C}^d$. As a statistical application we derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
Abstract:In this paper it is shown that $C_\beta$-smooth functions can be approximated by neural networks with parameters $\{0,\pm \frac{1}{2}, \pm 1, 2\}$. The depth, width and the number of active parameters of constructed networks have, up to a logarithimc factor, the same dependence on the approximation error as the networks with parameters in $[-1,1]$. In particular, this means that the nonparametric regression estimation with constructed networks attain the same convergence rate as with the sparse networks with parameters in $[-1,1]$.