Spiking neural networks (SNNs) are gaining more attention as a promising way that enables energy efficient implementation on emerging neuromorphic hardware. Yet now, SNNs have not shown competitive performance compared with artificial neural networks (ANNs), due to the lack of effective learning algorithms and efficient programming frameworks. We address this issue from two aspects: (1) We propose a neuron normalization technique to adjust the neural selectivity and develop a direct learning algorithm for large-scale SNNs. (2) We present a Pytorch-based implementation method towards the training of deep SNNs by narrowing the rate coding window and converting the leaky integrate-and-fire (LIF) model into an explicitly iterative version. With this method, we are able to train large-scale SNNs with tens of times speedup. As a result, we achieve significantly better accuracy than the reported works on neuromorphic datasets (N-MNIST and DVS-CIFAR10), and comparable accuracy as existing ANNs and pre-trained SNNs on non-spiking datasets (CIFAR10). To our best knowledge, this is the first work that demonstrates direct training of large-scale SNNs with high performance, and the efficient implementation is a key step to explore the potential of SNNs.