Harnessing data to discover the underlying governing laws or equations that describe the behavior of complex physical systems can significantly advance our modeling, simulation and understanding of such systems in various science and engineering disciplines. Recent advances in sparse identification show encouraging success in distilling closed-form governing equations from data for a wide range of nonlinear dynamical systems. However, the fundamental bottleneck of this approach lies in the robustness and scalability with respect to data scarcity and noise. This work introduces a novel physics-informed deep learning framework to discover governing partial differential equations (PDEs) from scarce and noisy data for nonlinear spatiotemporal systems. In particular, this approach seamlessly integrates the strengths of deep neural networks for rich representation learning, automatic differentiation and sparse regression to approximate the solution of system variables, compute essential derivatives, as well as identify the key derivative terms and parameters that form the structure and explicit expression of the PDEs. The efficacy and robustness of this method are demonstrated on discovering a variety of PDE systems with different levels of data scarcity and noise. The resulting computational framework shows the potential for closed-form model discovery in practical applications where large and accurate datasets are intractable to capture.