Abstract:This paper presents a novel data-driven approach for predicting the number of vegetation-related outages that occur in power distribution systems on a monthly basis. In order to develop an approach that is able to successfully fulfill this objective, there are two main challenges that ought to be addressed. The first challenge is to define the extent of the target area. An unsupervised machine learning approach is proposed to overcome this difficulty. The second challenge is to correctly identify the main causes of vegetation-related outages and to thoroughly investigate their nature. In this paper, these outages are categorized into two main groups: growth-related and weather-related outages, and two types of models, namely time series and non-linear machine learning regression models are proposed to conduct the prediction tasks, respectively. Moreover, various features that can explain the variability in vegetation-related outages are engineered and employed. Actual outage data, obtained from a major utility in the U.S., in addition to different types of weather and geographical data are utilized to build the proposed approach. Finally, a comprehensive case study is carried out to demonstrate how the proposed approach can be used to successfully predict the number of vegetation-related outages and to help decision-makers to detect vulnerable zones in their systems.