Federated learning (FL) has become a popular means for distributed learning at clients using local data samples. However, recent studies have shown that FL may experience slow learning and poor performance when client data are not independent and identically distributed (IID). This paper proposes a new federated learning algorithm, where the central server has access to a small dataset, learns from it, and fuses the knowledge into the global model through the federated learning process. This new approach, referred to as Federated learning with Server Learning or FSL, is complementary to and can be combined with other FL learning algorithms. We prove the convergence of FSL and demonstrate its benefits through analysis and simulations. We also reveal an inherent trade-off: when the current model is far from any local minimizer, server learning can significantly improve and accelerate FL. On the other hand, when the model is close to a local minimizer, server learning could potentially affect the convergence neighborhood of FL due to variances in the estimated gradient used by the server. We show via simulations that such trade-off can be tuned easily to provide significant benefits, even when the server dataset is very small.