Hundreds of millions of people lack access to electricity. Decentralised solar-battery systems are key for addressing this whilst avoiding carbon emissions and air pollution, but are hindered by relatively high costs and rural locations that inhibit timely preventative maintenance. Accurate diagnosis of battery health and prediction of end of life from operational data improves user experience and reduces costs. But lack of controlled validation tests and variable data quality mean existing lab-based techniques fail to work. We apply a scaleable probabilistic machine learning approach to diagnose health in 1027 solar-connected lead-acid batteries, each running for 400-760 days, totalling 620 million data rows. We demonstrate 73% accurate prediction of end of life, eight weeks in advance, rising to 82% at the point of failure. This work highlights the opportunity to estimate health from existing measurements using `big data' techniques, without additional equipment, extending lifetime and improving performance in real-world applications.