Operations and maintenance (O&M) is a fundamental problem in wind energy systems with far reaching implications for reliability and profitability. Optimizing O&M is a multi-faceted decision optimization problem that requires a careful balancing act across turbine level failure risks, operational revenues, and maintenance crew logistics. The resulting O&M problems are typically solved using large-scale mixed integer programming (MIP) models, which yield computationally challenging problems that require either long-solution times, or heuristics to reach a solution. To address this problem, we introduce a novel decision-making framework for wind farm O&M that builds on a multi-head attention (MHA) models, an emerging artificial intelligence methods that are specifically designed to learn in rich and complex problem settings. The development of proposed MHA framework incorporates a number of modeling innovations that allows explicit embedding of MIP models within an MHA structure. The proposed MHA model (i) significantly reduces the solution time from hours to seconds, (ii) guarantees feasibility of the proposed solutions considering complex constraints that are omnipresent in wind farm O&M, (iii) results in significant solution quality compared to the conventional MIP formulations, and (iv) exhibits significant transfer learning capability across different problem settings.