In this paper, we study cascading failures in power grids through the lens of information diffusion models. Similar to the spread of rumors or influence in an online social network, it has been observed that failures (outages) in a power grid can spread contagiously, driven by viral spread mechanisms. We employ a stochastic diffusion model that is Markovian (memoryless) and local (the activation of one node, i.e., transmission line, can only be caused by its neighbors). Our model integrates viral diffusion principles with physics-based concepts, by correlating the diffusion weights (contagion probabilities between transmission lines) with the hyperparametric Information Cascades (IC) model. We show that this diffusion model can be learned from traces of cascading failures, enabling accurate modeling and prediction of failure propagation. This approach facilitates actionable information through well-understood and efficient graph analysis methods and graph diffusion simulations. Furthermore, by leveraging the hyperparametric model, we can predict diffusion and mitigate the risks of cascading failures even in unseen grid configurations, whereas existing methods falter due to a lack of training data. Extensive experiments based on a benchmark power grid and simulations therein show that our approach effectively captures the failure diffusion phenomena and guides decisions to strengthen the grid, reducing the risk of large-scale cascading failures. Additionally, we characterize our model's sample complexity, improving upon the existing bound.