Abstract:Rough sleeping is a chronic problem faced by some of the most disadvantaged people in modern society. This paper describes work carried out in partnership with Homeless Link, a UK-based charity, in developing a data-driven approach to assess the quality of incoming alerts from members of the public aimed at connecting people sleeping rough on the streets with outreach service providers. Alerts are prioritised based on the predicted likelihood of successfully connecting with the rough sleeper, helping to address capacity limitations and to quickly, effectively, and equitably process all of the alerts that they receive. Initial evaluation concludes that our approach increases the rate at which rough sleepers are found following a referral by at least 15\% based on labelled data, implying a greater overall increase when the alerts with unknown outcomes are considered, and suggesting the benefit in a trial taking place over a longer period to assess the models in practice. The discussion and modelling process is done with careful considerations of ethics, transparency and explainability due to the sensitive nature of the data in this context and the vulnerability of the people that are affected.
Abstract:Retaining persons living with HIV (PLWH) in medical care is paramount to preventing new transmissions of the virus and allowing PLWH to live normal and healthy lifespans. Maintaining regular appointments with an HIV provider and taking medication daily for a lifetime is exceedingly difficult. 51% of PLWH are non-adherent with their medications and eventually drop out of medical care. Current methods of re-linking individuals to care are reactive (after a patient has dropped-out) and hence not very effective. We describe our system to predict who is most at risk to drop-out-of-care for use by the University of Chicago HIV clinic and the Chicago Department of Public Health. Models were selected based on their predictive performance under resource constraints, stability over time, as well as fairness. Our system is applicable as a point-of-care system in a clinical setting as well as a batch prediction system to support regular interventions at the city level. Our model performs 3x better than the baseline for the clinical model and 2.3x better than baseline for the city-wide model. The code has been released on github and we hope this methodology, particularly our focus on fairness, will be adopted by other clinics and public health agencies in order to curb the HIV epidemic.
Abstract:Like medicine, psychology, or education, data science is fundamentally an applied discipline, with most students who receive advanced degrees in the field going on to work on practical problems. Unlike these disciplines, however, data science education remains heavily focused on theory and methods, and practical coursework typically revolves around cleaned or simplified data sets that have little analog in professional applications. We believe that the environment in which new data scientists are trained should more accurately reflect that in which they will eventually practice and propose here a data science master's degree program that takes inspiration from the residency model used in medicine. Students in the suggested program would spend three years working on a practical problem with an industry, government, or nonprofit partner, supplemented with coursework in data science methods and theory. We also discuss how this program can also be implemented in shorter formats to augment existing professional masters programs in different disciplines. This approach to learning by doing is designed to fill gaps in our current approach to data science education and ensure that students develop the skills they need to practice data science in a professional context and under the many constraints imposed by that context.