Uncertainty quantification is at the core of the reliability and robustness of machine learning. It is well-known that uncertainty consists of two different types, often referred to as aleatoric and epistemic uncertainties. In this paper, we provide a systematic study on the epistemic uncertainty in deep supervised learning. We rigorously distinguish different sources of epistemic uncertainty, including in particular procedural variability (from the training procedure) and data variability (from the training data). We use our framework to explain how deep ensemble enhances prediction by reducing procedural variability. We also propose two approaches to estimate epistemic uncertainty for a well-trained neural network in practice. One uses influence function derived from the theory of neural tangent kernel that bypasses the convexity assumption violated by modern neural networks. Another uses batching that bypasses the time-consuming Gram matrix inversion in the influence function calculation, while expending minimal re-training effort. We discuss how both approaches overcome some difficulties in applying classical statistical methods to the inference on deep learning.