Abstract:In a groundbreaking work, Schmidt-Hieber (2020) proved the minimax optimality of deep neural networks with ReLu activation for least-square regression estimation over a large class of functions defined by composition. In this paper, we extend these results in many directions. First, we remove the i.i.d. assumption on the observations, to allow some time dependence. The observations are assumed to be a Markov chain with a non-null pseudo-spectral gap. Then, we study a more general class of machine learning problems, which includes least-square and logistic regression as special cases. Leveraging on PAC-Bayes oracle inequalities and a version of Bernstein inequality due to Paulin (2015), we derive upper bounds on the estimation risk for a generalized Bayesian estimator. In the case of least-square regression, this bound matches (up to a logarithmic factor) the lower bound of Schmidt-Hieber (2020). We establish a similar lower bound for classification with the logistic loss, and prove that the proposed DNN estimator is optimal in the minimax sense.
Abstract:The explicit regularization and optimality of deep neural networks estimators from independent data have made considerable progress recently. The study of such properties on dependent data is still a challenge. In this paper, we carry out deep learning from strongly mixing observations, and deal with the squared and a broad class of loss functions. We consider sparse-penalized regularization for deep neural network predictor. For a general framework that includes, regression estimation, classification, time series prediction,$\cdots$, oracle inequality for the expected excess risk is established and a bound on the class of H\"older smooth functions is provided. For nonparametric regression from strong mixing data and sub-exponentially error, we provide an oracle inequality for the $L_2$ error and investigate an upper bound of this error on a class of H\"older composition functions. For the specific case of nonparametric autoregression with Gaussian and Laplace errors, a lower bound of the $L_2$ error on this H\"older composition class is established. Up to logarithmic factor, this bound matches its upper bound; so, the deep neural network estimator attains the minimax optimal rate.
Abstract:Recent developments on deep learning established some theoretical properties of deep neural networks estimators. However, most of the existing works on this topic are restricted to bounded loss functions or (sub)-Gaussian or bounded input. This paper considers robust deep learning from weakly dependent observations, with unbounded loss function and unbounded input/output. It is only assumed that the output variable has a finite $r$ order moment, with $r >1$. Non asymptotic bounds for the expected excess risk of the deep neural network estimator are established under strong mixing, and $\psi$-weak dependence assumptions on the observations. We derive a relationship between these bounds and $r$, and when the data have moments of any order (that is $r=\infty$), the convergence rate is close to some well-known results. When the target predictor belongs to the class of H\"older smooth functions with sufficiently large smoothness index, the rate of the expected excess risk for exponentially strongly mixing data is close to or as same as those for obtained with i.i.d. samples. Application to robust nonparametric regression and robust nonparametric autoregression are considered. The simulation study for models with heavy-tailed errors shows that, robust estimators with absolute loss and Huber loss function outperform the least squares method.
Abstract:This paper carries out sparse-penalized deep neural networks predictors for learning weakly dependent processes, with a broad class of loss functions. We deal with a general framework that includes, regression estimation, classification, times series prediction, $\cdots$ The $\psi$-weak dependence structure is considered, and for the specific case of bounded observations, $\theta_\infty$-coefficients are also used. In this case of $\theta_\infty$-weakly dependent, a non asymptotic generalization bound within the class of deep neural networks predictors is provided. For learning both $\psi$ and $\theta_\infty$-weakly dependent processes, oracle inequalities for the excess risk of the sparse-penalized deep neural networks estimators are established. When the target function is sufficiently smooth, the convergence rate of these excess risk is close to $\mathcal{O}(n^{-1/3})$. Some simulation results are provided, and application to the forecast of the particulate matter in the Vit\'{o}ria metropolitan area is also considered.
Abstract:We consider the nonparametric regression and the classification problems for $\psi$-weakly dependent processes. This weak dependence structure is more general than conditions such as, mixing, association, $\ldots$. A penalized estimation method for sparse deep neural networks is performed. In both nonparametric regression and binary classification problems, we establish oracle inequalities for the excess risk of the sparse-penalized deep neural networks estimators. Convergence rates of the excess risk of these estimators are also derived. The simulation results displayed show that, the proposed estimators overall work well than the non penalized estimators.
Abstract:This paper considers deep neural networks for learning weakly dependent processes in a general framework that includes, for instance, regression estimation, time series prediction, time series classification. The $\psi$-weak dependence structure considered is quite large and covers other conditions such as mixing, association,$\ldots$ Firstly, the approximation of smooth functions by deep neural networks with a broad class of activation functions is considered. We derive the required depth, width and sparsity of a deep neural network to approximate any H\"{o}lder smooth function, defined on any compact set $\mx$. Secondly, we establish a bound of the excess risk for the learning of weakly dependent observations by deep neural networks. When the target function is sufficiently smooth, this bound is close to the usual $\mathcal{O}(n^{-1/2})$.
Abstract:In this paper, we perform deep neural networks for learning $\psi$-weakly dependent processes. Such weak-dependence property includes a class of weak dependence conditions such as mixing, association,$\cdots$ and the setting considered here covers many commonly used situations such as: regression estimation, time series prediction, time series classification,$\cdots$ The consistency of the empirical risk minimization algorithm in the class of deep neural networks predictors is established. We achieve the generalization bound and obtain a learning rate, which is less than $\mathcal{O}(n^{-1/\alpha})$, for all $\alpha > 2 $. Applications to binary time series classification and prediction in affine causal models with exogenous covariates are carried out. Some simulation results are provided, as well as an application to the US recession data.