Federated learning is a new distributed learning paradigm that enables efficient training of emerging large-scale machine learning models. In this paper, we consider federated learning on non-convex objectives with compressed communication from the clients to the central server. We propose a novel first-order algorithm (\texttt{FedSTEPH2}) that employs compressed communication and achieves the optimal iteration complexity of $\mathcal{O}(1/\epsilon^{1.5})$ to reach an $\epsilon$-stationary point (i.e. $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq \epsilon$) on smooth non-convex objectives. The proposed scheme is the first algorithm that attains the aforementioned optimal complexity with compressed communication and without using full client gradients at each communication round. The key idea of \texttt{FedSTEPH2} that enables attaining this optimal complexity is applying judicious momentum terms both in the local client updates and the global server update. As a prequel to \texttt{FedSTEPH2}, we propose \texttt{FedSTEPH} which involves a momentum term only in the local client updates. We establish that \texttt{FedSTEPH} enjoys improved convergence rates under various non-convex settings (such as the Polyak-\L{}ojasiewicz condition) and with fewer assumptions than prior work.