The substantial effect of air pollution on cardiovascular disease and mortality burdens is well-established. Emissions-reducing interventions on coal-fired power plants -- a major source of hazardous air pollution -- have proven to be an effective, but costly, strategy for reducing pollution-related health burdens. Targeting the power plants that achieve maximum health benefits while satisfying realistic cost constraints is challenging. The primary difficulty lies in quantifying the health benefits of intervening at particular plants. This is further complicated because interventions are applied on power plants, while health impacts occur in potentially distant communities, a setting known as bipartite network interference (BNI). In this paper, we introduce novel policy learning methods based on Q- and A-Learning to determine the optimal policy under arbitrary BNI. We derive asymptotic properties and demonstrate finite sample efficacy in simulations. We apply our novel methods to a comprehensive dataset of Medicare claims, power plant data, and pollution transport networks. Our goal is to determine the optimal strategy for installing power plant scrubbers to minimize ischemic heart disease (IHD) hospitalizations under various cost constraints. We find that annual IHD hospitalization rates could be reduced in a range from 20.66-44.51 per 10,000 person-years through optimal policies under different cost constraints.