With increasing concerns for data privacy and ownership, recent years have witnessed a paradigm shift in machine learning (ML). An emerging paradigm, federated learning (FL), has gained great attention and has become a novel design for machine learning implementations. FL enables the ML model training at data silos under the coordination of a central server, eliminating communication overhead and without sharing raw data. In this paper, we conduct a review of the FL paradigm and, in particular, compare the types, the network structures, and the global model aggregation methods. Then, we conducted a comprehensive review of FL applications in the energy domain (refer to the smart grid in this paper). We provide a thematic classification of FL to address a variety of energy-related problems, including demand response, identification, prediction, and federated optimizations. We describe the taxonomy in detail and conclude with a discussion of various aspects, including challenges, opportunities, and limitations in its energy informatics applications, such as energy system modeling and design, privacy, and evolution.