Abstract:Online reinforcement learning (RL) algorithms offer great potential for personalizing treatment for participants in clinical trials. However, deploying an online, autonomous algorithm in the high-stakes healthcare setting makes quality control and data quality especially difficult to achieve. This paper proposes algorithm fidelity as a critical requirement for deploying online RL algorithms in clinical trials. It emphasizes the responsibility of the algorithm to (1) safeguard participants and (2) preserve the scientific utility of the data for post-trial analyses. We also present a framework for pre-deployment planning and real-time monitoring to help algorithm developers and clinical researchers ensure algorithm fidelity. To illustrate our framework's practical application, we present real-world examples from the Oralytics clinical trial. Since Spring 2023, this trial successfully deployed an autonomous, online RL algorithm to personalize behavioral interventions for participants at risk for dental disease.
Abstract:Geomagnetic storms (GS) occur when solar winds disrupt Earth's magnetosphere. GS can cause severe damages to satellites, power grids, and communication infrastructures. Estimate of direct economic impacts of a large scale GS exceeds $40 billion a day in the US. Early prediction is critical in preventing and minimizing the hazards. However, current methods either predict several hours ahead but fail to identify all types of GS, or make predictions within short time, e.g., one hour ahead of the occurrence. This work aims to predict all types of geomagnetic storms reliably and as early as possible using big data and machine learning algorithms. By fusing big data collected from multiple ground stations in the world on different aspects of solar measurements and using Random Forests regression with feature selection and downsampling on minor geomagnetic storm instances (which carry majority of the data), we are able to achieve an accuracy of 82.55% on data collected in 2021 when making early predictions three hours in advance. Given that important predictive features such as historic Kp indices are measured every 3 hours and their importance decay quickly with the amount of time in advance, an early prediction of 3 hours ahead of time is believed to be close to the practical limit.
Abstract:Cancer is the second leading cause of death in the world. Diagnosing cancer early on can save many lives. Pathologists have to look at tissue microarray (TMA) images manually to identify tumors, which can be time-consuming, inconsistent and subjective. Existing algorithms that automatically detect tumors have either not achieved the accuracy level of a pathologist or require substantial human involvements. A major challenge is that TMA images with different shapes, sizes, and locations can have the same score. Learning staining patterns in TMA images requires a huge number of images, which are severely limited due to privacy concerns and regulations in medical organizations. TMA images from different cancer types may have common characteristics that could provide valuable information, but using them directly harms the accuracy. Transfer learning is adopted to increase the training sample size by extracting knowledge from tissue images from different cancer types. Transfer learning has made it possible for the algorithm to break the critical accuracy barrier. The proposed algorithm reports an accuracy of 75.9% on breast cancer TMA images from the Stanford Tissue Microarray Database, achieving the 75% accuracy level of pathologists. This will allow pathologists to confidently use automatic algorithms to assist them in recognizing tumors consistently with a higher accuracy in real time.