We introduce a conceptual framework for numerically solving linear elliptic, parabolic, and hyperbolic PDEs on bounded, polytopal domains in euclidean spaces by deep neural networks. The PDEs are recast as minimization of a least-squares (LSQ for short) residual of an equivalent, well-posed first-order system, over parametric families of deep neural networks. The associated LSQ residual is a) equal or proportional to a weak residual of the PDE, b) additive in terms of contributions from localized subnetworks, indicating locally ``out-of-equilibrium'' of neural networks with respect to the PDE residual, c) serves as numerical loss function for neural network training, and d) constitutes, even with incomplete training, a computable, (quasi-)optimal numerical error estimator in the context of adaptive LSQ finite element methods. In addition, an adaptive neural network growth strategy is proposed which, assuming exact numerical minimization of the LSQ loss functional, yields sequences of neural networks with realizations that converge rate-optimally to the exact solution of the first order system LSQ formulation.